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Preface 


This book provides sufficient materials for a one-semester linear algebra course 
at the sophomore level. It is based on the lecture notes for the linear algebra 
course that the author taught several years to undergraduate students in sci- 
ence and mathematics at the University of Texas at Dallas. The level and pace 
of the course can be adjusted by balancing the time for theoretical illustration 
and that for computational aspects of the subject. The author usually taught 
up to Chapter 7, spending one lecture per section on average, while the re- 
maining two chapters can be left for students” reading homework or supervised 
individual study. 

It seems that many undergraduate students have only one linear algebra 
course before graduation, and may have missed many important topics of 
linear algebra which may be remedied later by self-studying on demand. This 
book is written to accommodate the needs for classroom teaching in order 
to effectively deliver the essential topics of the subject, and for self-studying 
beyond a first linear algebra course. 

The following is an introduction to each chapter of the book. 


1. Chapter 1 deals with vectors, linear combinations and dot products in 
R”. In Section 1.3 we discuss matrix representations for linear systems 
and for elementary row operations. 


2. Chapter 2 illustrates Guassian elimination and Gauss—Jordan elimina- 
tion for solving linear systems, along with basic matrix theory, LU- 
decomposition and permutation matrices. 


3. Chapter 3 starts with four subspaces of R” associated with a real matrix. 
Then we discuss bases and dimensions of general vector spaces. 


4. Chapter 4 deals with orthogonality between subspaces. Related topics 
include matrix representation of orthogonal projection, least squares so- 
lutions, Gram-Schmidt process and QR-decomposition. 


5. Chapter 5 presents an axiomatic method of determinants which nat- 
urally leads to the permutation formula, co-factor expansion, product 
formula and Cramer’s rule. 


6. In Chapter 6 we introduce the notions of eigenvalues and eigenvectors 
which open the door for more applications of linear algebra, including 
the immediate application on diagonalizability, spectral decomposition 
of symmetric real matrices, quadratic forms, positive definite matrices 
and Rayleigh quotient. 


ix 


x Preface 


7. Chapter 7 continues to discuss the application of eigenvalues and eigen- 
vectors and presents singular value decomposition of general matrices. 
Principal component analysis is also introduced as a real-world applica- 
tion of linear algebra. 


8. Chapter 8 discusses the matrix representation, range and null spaces 
for linear transformations on general vector spaces. Then we introduce 
invariant subspaces, decomposition of vector spaces and Jordan normal 
form and its computation, where the treatment of the Jordan normal 
form does not require a formal exposition of polynomial theory. 


9. Chapter 9 presents basic theory of linear programming along with the 
simplex method which is another concrete real-world application of lin- 
ear algebra and which has been widely used in management and industry. 


The book contains typical topics for linear algebra courses and can be used 
in many ways depending on the different mathematical background of the 
audiences. The book provides limited examples and exercises, while it is best 
used for readers who would like to have a broad coverage of the topics of linear 
algebra and who are motivated to customize questions for the materials of each 
section. Comments and suggestions from readers are highly appreciated and 
are welcome to be sent by e-mail to qingwen@utdallas. edu. 


Qingwen Hu 
January 2017 


Chapter 1 


Vectors and linear systems 


1.1 Vectors and linear combinations ............. 0. ccc cece cece eee cc. 1 
1.2 Length, angle and dot products ..... cei 4 
1.3 Matrices „nastea ose ga ară eu i ia a a drake nu ia 8 


A central goal of linear algebra is to solve systems of linear equations. We 
have seen the simplest linear equation ax = b, where x € R (the symbol “e” 
means “in”) is the unknown variable and a, b € R are constants. It is known 
that there are three scenarios for the solutions: 1) if a # 0, there is a unique 
solution z = 2: 2) if a = 0, b #0, there is no solution; 3) if a = b = 0, there 
are infinitely many solutions. We are then motivated to investigate systems of 
equations with multiple unknown variables. The following system 


L+2y+3z=3 
2x + 5y+8z=9 (1.1) 
3x + 6y + 18z = 18 


is a system of linear equations with three equations and three unknowns. In 
this chapter, we learn how to use vectors to represent a linear system and 
learn the ideas of elimination which will be applied to solve systems of linear 
equations. The general form of linear systems is as follows: 


Q111 + 0122 T *** T alnn = bi, 


Q211 T 029282 T *** +02 Tn = ba, 


(1.2) 


Om ti + Am2%2 +:** + UmnTn = bm, 


where x = (11, £2, *** n) € R” is the unknown vector in n-dimensional 
Euclidean space; a;,; and b; with i € {1, 2,---,m}, j € {1, 2,---,n} are 
constants. 


1.1 Vectors and linear combinations 


Before we discuss how to solve general linear systems, we use system (1.1) 
as a prototype to introduce the machinery of vectors. One may rewrite sys- 


1 
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tem (1.1) as 
1 2 3 3 
xz |2| +y|5|+2|8|=]|9|. (1.3) 
3 6 18 18 


System (1.3) makes sense only if we have defined addition and scalar multi- 
plication of vectors in Euclidean spaces, where we have identified the vector 
(21, £2, £3) with the column of numbers 


which is called a column matrix. In what follows we will always regard a 
vector in R” as a column matrix. 


Definition 1.1.1. Let x = (21, £2, +++ ,Un), Y = (Y1, Ya, +++ , Yn) be 
vectors in R”, a a scalar. We define addition x + y and scalar multi- 
plication az by 


z+y =(x%1 + Y1, T2 + Ye, cat ial 
ax = (a, ara, ` ,QTn). 


Definition 1.1.2. Let z1, 22,---,%, € RN be vectors, and 
C1, C2 ++: ,Cn ER be scalars. We call 


Gize ar (CAB ap ooo Caden 


a linear combination of £1, £2, +--+ , Zn. 


System (1.3) now can be interpreted as finding a proper linear combina- 
tion of the vectors (1, 2, 3), (2, 5, 6) and (3, 8, 18) to produce the given vector 
(3, 9, 18) on the right hand side. Certainly we can also interpret it as finding 
the common point (x, y, z) of three planes determined by each of the equa- 
tions. If we visualize a linear system with this interpretation of a linear system, 
we obtain a row picture, while with the previous one, a column picture. 


Example 1.1.3. 1. Let v= |. „W = E 


s0+50==[]+s = [3 


is a linear combination of v and w. 


| . Then 
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O 


1 


N 
w 
A 


FIGURE 1.1: Slope of OA = 220 = 2, BB = #2 = 2. Slope of OB = 
32 = 3, AP = £1 = 3. 


2. Let v = B . Then (cv : 0 < c < 2} represents a line segment from (0, 0) 


to (2, 4) in R2. 


3. Let v= A and w = A Then {cv + dw : c € R,d € R} represents the 


whole two dimensional plane R?. 


1 1 
4. Let v = |1| and w = |1|. Then S = {cv + dw : c € R,d € R} 
0 1 
represents a two dimensional plane in R3, but not the whole space R3, 
1 
because there exists the vector |2| which is not in S. 
3 


Example 1.1.4. (The parallelogram law for vector addition) A vector z = 


(11,L2,*** , £n) ER” can be visualized by the directed line segment OA from 
the origin O = (0, 0, ---,0) to the point A = (a, £2, -*** ,Tn) E€ R”. If we 
denote the end point of the vector y = (y1,Yya,-** ,Yn) by B and that of z +y 
by P, then we have a parallelogram OAPB, with OA parallel to BP and AP 
parallel to OB, since the opposite segments have the same slopes. 


Exercise 1.1.5. 


1. Let u = A ,v= | . i) Sketch the directed line segments in R? that 
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represents u and v, respectively; ii) Use the parallelogram law to visualize the 
vector addition u + v; iii) Find 2u, 2u + 5v and 2v — 5u; iv) Solve the system 


of equations zu + yv = El for (x, y) € R? and draw the row picture and 


the column picture. 
1 2 1 i aie Al 
2. Let u = 1|»%= fal: Is w = pa linear combination of u and v? 


3. Is it true every vector (x, y) € R? can be represented as a linear combination 
of v = (1, 0) and w = (1, 1)? 


4. Find vectors u, v, w € R3 such that the following system 


Lb eS 
2x + 5y + 8z = —1 
z+y=l 


can be rewritten as cu + yu + zw = b, where b = (1, —1, 1). 


5. Show that R? = fo 5] +y H :x£ ER, ver} 


1.2 Length, angle and dot products 


In order to discuss geometry in Euclidean spaces, we introduce the notions 
of length and angle, which can be defined with dot products. 


Definition 1.2.1. Let x= (£1, £2, ::: , Zn), Y = (Y1, Ya, *** , Yn) be vectors 
in R”; the dot product x- y is defined by 


n 
T: Y =01Y1 topi" + EnYn = Yi 
isl 


Example 1.2.2. 1. Let v = H „WU = A . Then 
1 2 
swell ec ala 


1 1 
2. Let v = |1| and w= |1|. Then v:-w=1-1+1-1+1.0=2. 
0 1 
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1 1 
3. Let v = H „WU = A] . Then 


v-w=1-1+1-(-=1)=0. 
We say v and w are orthogonal to each other and write v L w. 
4. Consider the distance from A = (1, 2) to the origin O. We have 
|OÂI =/(1=0P+=0) 
=vy1-1+2-2. 
If we denote by v the vector OA, we have the length of v 
loll = vvv. 


5. Consider unit vectors u, v € R?. Then there exist a, 3 € [0, 27) such 
that 
u = (cosa, sina), v = (cos 6, sin 6). 


Then we have 


u-v = cosa cos $ + sin asin 8 = cos(a — 6). 


One can check directly that dot product satisfies the following 
Lemma 1.2.3. Let u, v, w € R” be vectors. Then 


u:v=v-u, 


u-(v+uw)=u-v+u:w. 


Definition 1.2.4. Let v = (v1, v2, ::: ,Un) be a vector in R”. The length ||v]| 
of v is defined by 


loll = vo = (E) | 


i=1 
A vector with unit length is called a unit vector. 


Example 1.2.5. 


Consider unit vectors u, v € R?. Then there exist a, 8 € [0, 27) such that 
u = (cosa, sina), v = (cos 8, sin 6). 
Then we have 
u-v = cosacosf + sin a sin 3 = cos(a — 8). 


There exists 0 € [0, 7] such that cos = cos(a — 6). Then we call 9 the angle 
between the vectors u and v. 
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Consider nonzero vectors u, v € R?. Then Tal and > are unit vectors and 


pu ll 
there exist a, 8 € [0, 27) such that 


oe (cosa, sina), = = (cos 6, sin 6). 
llull |0| 
We have 
u v 
Tal Tol = cos(a — B) = cos, (1.4) 


where 0 € [0, 11] is the angle between Tal and Ter: Notice that u and Tal have 
the same direction, so do v and pir. 0 € [0, 7] is also the angle between u and 
v. By (1.4) we have 

uv = lull|lo|| cosó, 


where u and v can be zero. Then we have derived 


Lemma 1.2.6. (Cosine formula) Let u, v € R?. We have 


u: v = |ull||vl] cos, 


where 0 € [0, 7] is the angle between u and v. 


An immediate consequence of the cosine formula is that |v - w| = 
lhulllvlI|cos6| < |julll|v|| which is the Schwarz inequality in R?. We show the 
general version of the Schwartz inequality in R”: 


Lemma 1.2.7. (Schwarz inequality) Let u, v € R”. We have 


lu- v| < lulu. 


Proof. The inequality is true if v = 0. We assume that v + 0 and let w = u+tv, 
t € R. Then ||w|| > 0 for every t € R. We have 


0 < ul] = (u + tv) - (u + tv) 
=u-u+2(u- v)t + (v: vjt?, 


for every t € R. Therefore the discriminant of the quadratic polynomial (u + 
tv, u + tv) of t satisfies 


A(u-v)? —4(u-u)(v-v) <0, 


which is equivalent to |u- v| < [|ul||w!|. O 
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With the Schwarz inequality, we can then define angles between vectors in 
R”: 


Definition 1.2.8. Let u, v € R”. We define 0 € [0, 7] such that 
u- v = llulllul| cosó, 
the angle between u and v. 


By properties of dot products and the Schwarz inequality, we have 


Lemma 1.2.9. (Triangle inequality) Let u, v € R”. We have 


lu + ol] < llull + llel. 


Proof. We have 
lu rol? =(u + v) - (u +v) 


=u- u+ 2u- -v+u v 


<u- u+ 2lhul|- loll +v- v 
= a? + 2lleal loll + loll? 
= (lull + lloll)? 


Therefore we have ||u + v|| < [Jul] + lvl]. o 


Exercise 1.2.10. 


1 3 
angle 0 between u and v; iv) Verify that |u -v| < |ulllvl|; v) Verify that 
lu + oll < llull + lvl. 


1. Let u = +] v= 5 . i) Find u - v; ii) Find |lull and ||v||; iii) Find the 


2. Find all possible real values of a such that the quadratic polynomial z? + 
ax +1 has i) two positive roots; ii) two negative roots; iii) one negative and 
one positive root; iv) no real roots, respectively. 


3. Let u = i . Find all possible vectors w such that u L w, i.e., u- w = 0. 


1 —1 1 
4. Let u = |1|,v = | 1| andw= | 1|. i) Find u-v and v- w. ii) Is it 
1 0 —1 


possible to find (x, y) # (0, 0) such that v = zu + yw? Justify your answer. 
5. Let u, ve R”. Show that 
lu: v| = [lulilloll, 


if and only if v = 0 or there exists a scalar t € R such that u = tv. 
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1.3 Matrices 


Recall that system (1.3) can be interpreted as finding a proper linear com- 
bination of the vectors u = (1, 2,3), v = (2, 5,8) and w = (3, 6, 18) to 
produce the given vector b = (3, 9, 18) on the right hand side. That is, we 
are looking for scalars x, y, z such that 


zu+yu+zw=b, 


which looks to be a certain product between (u, v, w) and (x, y, z). To wit, 
we write 


T 
lu v wllyl=b, 
VA 
which is a “row” multiplied by a “column.” The reason why we put the letters 


for vectors horizontally becomes clear when we recover the values of u, v, w 
and b: 


1 2 3 E 3 
2 5 8 y| =| 9], (1.5) 
3 6 18] |z 18 


where we obtain a rectangular array of numbers called a matrix, and if u, v, w 
were placed vertically, we would not know how to place their values! 
Let 


1 2 3 XT 3 
A=|2 5 8|,x=|y|,b=]|9 
3 6 18 z 18 


System (1.3) becomes the familiar form of 
Ax =b. (1.6) 


By comparing system (1.3) with system (1.5), we know that the so far unde- 
fined product Ax between matrices A and x essentially consists of rows of A 
taking dot products with x. That is, 


(Row 1 of 4) -x 
(Row 2 of 4)-x| =b. 
(Row 3 of 4) -x 


Example 1.3.1. 


+ 2 e ea - [3 
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3 —2] ro (3, —2) - (2, 1) 4 
-1 0 HE (-1, 0). (2, 1)| = |-2 
2 5 (2, 5) - (2, 1) 9 

1 0 0 Ly Ly 

0 1 0 Ta| = ¡Ta 

0 0 1 T3 Xa 


Remark 1.3.2. 


For an m x n matrix A, we write A = (a;;) when we emphasize the general 
form of its entries. We also write (4);;, A(i, j) or simply A; to denote the 
entry at the (i, j)-position. 


If A = (a,;;) is an n x n square matrix, we call the entries aj, i = 1,2,::-,n 
the main diagonal entries. If every main diagonal entry of A is one, and every 
other entries are zero, that is, 


1 ifi=j 
aig = epa i 
0 ifizj, 


we call A an identity matrix and denote it by I. Note that 


Ix = x for every x € R”. 


Example 1.3.3. Let u = (1, 0, 0), v = (1, 1,0) and w = (1, 1, 1). b = 
(b1, b2, b3). We solve system 


where 


That is, we solve 


1 1 1} fe by 
0 0 1| Iz bs 


We notice that A is a triangular matrix in the sense that the nonzero entries 
are above the main diagonal. Such type of matrix is convenient for solving the 
system by back substitution. Namely, we first solve for z, then y and x. We 
obtain 


IL bı — bo — b3 
y| = ba — b3 
Z b3 
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To have a solution resembling the solution x = a” lb of the single variable 
linear equation az = b, a # 0, we wish to write (x, y, z) in terms of b = 
(b1, b2, b3). We rewrite the solution as follows: 


y| = b2 — bg 
Z b3 
bi —ba —b3 


| 


0 

0 
—-1 
=b; 0 + ba +b3 |-1 


1 
1 -1 —1| |b 
=|0 1 —1| |b 
0 0 1] |b 
Let 
1 -1 -1 
B= |0 1 —1 
0 0 1 


We have the solution x = Bb. We write B = A`! and x = Alb. Note 
that we did not specify the values of b. The system in question has a unique 
solution for every given b € R3. a 


Example 1.3.4. Let u = (1, 0, 0), v = (1, 1, 0) and w* = (0, 1, 0). b = 
(b1, b2, b3). We solve system 
Ax =b 


? 


where 


1 1 0 x 
A=[u v w]=|0 1 1|,x=|y 
0 0 0 2 


That is, we solve 


1 1 0 x bi 
0 1 1| ly! = {be 
0 0 Ol Jz b3 


We notice that A is also a triangular matrix but we cannot solve the system 
by back substitution. The third equation of the system is 


0 = bs, 


which may or may not be true depending on the value of bz. 


If b3 + 0, system Ax = b has no solution. 
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If b3 = 0, system Ax = b becomes 


11 Q|” _ |b 
o 1 1 {4} lo] 
Zz 
which has a free variable z that can be parameterized by z = t, te R. Then 
we have 


IX by = ba +t by — ba 1 
y| = bg -t| = b|+t|-I|,teR, (1.7) 
4 t 0 1 

which represents infinitely many solutions on a straight line in R3. o 


Let us make some observations on the previous two examples. In Exam- 
ple 1.3.4, for arbitrary b € R3, we have a unique solution x = 47*b. That is, 
the vector equation 

zu + yv + zw = b 


always has a unique solution for the linear combination coefficients (x, y, z). 
This implies that the set of vectors {u, v, w} can span the whole space R3. 

In Example 1.3.3, there exists b = (b1, be, b3) € R? with b3 + 0, which is 
not a linear combination of {u, v, w*}. That is, the set of vectors {u, v, w*} 
cannot span the whole space R3. But why can {u, v, w}, while both sets have 
three different vectors? The answer is that {u, v, w*) has redundant vectors, 
but {u, v, w} does not. Namely, the role of some vectors in Lu, v, w*} can be 
replaced by other vectors. To identify the redundancy, we set up the following 
model: 

214 + 2920 + 23w* = 0, 


solving for (xı, £2, 13). By (1.7), we have at least one nonzero solution 
(21, 2, 23) = (1, —1, 1). That is, 


lu + (—1)v + 1w* = 0 4> v = u + u*. 


That is, v can be replaced with u + w*. Therefore, the spanning role of 
Lu, v, w*} is the same as that of {u, w*}, which cannot span a three di- 
mensional space. 

Next we verify that there is no redundancy in {u, v, w} for spanning R3. 
We also set up the following model: 


11U + rau + raw = 0, 


solving for (11, 12, 23). By the solution in Example 1.3.4, we have the only 
solution (11, £2, 13) = (0, 0, 0). This means that none of the vectors in 
{u, v, w} can be replaced by a linear combination of the other ones. They 
are linearly independent. 
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Definition 1.3.5. Let (us, u2,--- , Um} be a set of vectors in R”. If 
the vector equation 


UU, + 2349 +--+ + Enun = 0 


has only the trivial solution zı = £2 =--- = £n = 0, Luz, U2, > , Um} 
is said to be linearly independent. Otherwise, Luz, ua, -+> , Um} is said 
to be linearly dependent. 


We finish this chapter with examples on matrix multiplication with ele- 
mentary matrices. 


Example 1.3.6. Elementary matrices 


1 0 


e Consider Ex = b, where b = ui xe = a „E = 
b2 l 1 


| Then Ex = 
T2 


| a | Note that the effect of multiplication by E from the left of x is 
xa + lax, 


“adding l-multiple of row 1 to row 2.” The solution is 


1 0 
coal 
multiplication by E~! from the left of b is “subtracting l-multiple of row 1 
from row 2.” Moreover, using x = E~'b and the original system Ex = b, we 
have 


Denote by El = . We have the solution x = E~'b. The effect of 


E(E~'b)=b, E7'(Ex) =x. 


That is, the multiplication actions from the left of a vector by E and E”? 
are canceling each other. If we treat the action A : x ++ Ax as a function 
determined by the matrix A, then the effect from ÆT! o E and Eo E”! is the 
same as the identity matrix J. 


T2 1 0 1 


Note that the effect of multiplication by E from the left of x is “exchanging 
positions of row 1 and row 2.” The solution is 


aa 
“kd: 


e Consider Ex = b, where b = Es x= H > E= E i „Then Ex = H 3 
2 


Vectors and linear systems 13 


Denote by E”! = f i , which is identical to E itself. We have the solution 
x = E~'b. The effect of multiplication by E~! from the left of b is “exchang- 
ing positions of row 1 and row 2.” Moreover, using x = E~'b and the original 


system Ex = b, we have 
E(E~'b)=b, E *(Ex)= 


That is, the multiplication actions from the left of a vector by E and E”! are 
canceling each other. The multiplication effects from E7*o E and Eo E”? 
are the same as the identity matrix J. 


e Consider Ex = b, where b = EI x= ES E= E j with c +0. Then 
2 


T2 i 
Ex = i . Note that the effect of multiplication by E from the left of x is 
2 


“multiplying row 2 by c”. The solution is 


E i . We have the solution x = E~'b. The effect of 


multiplication by E~! from the left of b is “dividing row 2 by c”. Moreover, 
using x = E~'b and the original system Ex = b, we have 


Denote by E7! = 


E(E™'b) =b, E*(Ex)= 


That is, the multiplication actions from the left of a vector by E and E7! 
are canceling each other. If we treat the action A: x > Ax as a function 
determined by the matrix A, then the effect from ÆT! o E and Eo E”! is the 
same as the identity matrix J. 

The aforementioned three type of matrices are called elementary ma- 
trices which can be obtained by operating on the identity matrices with the 
elementary row operation in question. oO 


Exercise 1.3.7. 


1 0 0 1 0 0 

1.Let A= |1 1 0 and B=|-1 1 0|. Compute i) A+ B, A+2B 
1 1 1 0 -1 1 

and A — 3B; ii) AB and BA. 

2. Let A = E A and B = i dl i) Compute AB and BA; ii) Is AB = 


BA? 
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3. Find matrices A and b such that the system 


+z=1 
2x + 5y + 8z = —1 
z+y=l 


can be rewritten into matrix form Az = b, where x = (x, y, 2). 


1 0 0 
4. Let e = |0|, eg = |1| and ez = |0] . Determine that {e1, es, e3} is a 
0 0 1 
linearly independent set of vectors in R. 
1 0 1 
5. Let vy = |1|, vo = |1| and v3 = |0|. Determine whether or not 
1 1 


{v1, v2, v3} is a linearly independent set of vectors in R3. 


6. Let u = (1,0), v = (1, 1) and w = (1, 2). Show that {u, v, w} is not 
linearly independent. 


7. Let S = (us, uo,::: un) be a set of vectors in R”. If one of them is the 
zero vector, is S linearly independent? 


8. Let {v1, va, v3} C R” be a set of linearly independent vectors. Determine 
whether {v1 + v2, va + v3, v3 + 01) is linearly independent or not. 


9. Show that every set of four vectors in R? is linearly dependent. 


10. Find the canceling matrices E~! of the following E’s. 
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With the preparation of Chapter 1, we start the discussion of how to solve 
linear systems, using matrix representation of linear systems and matrix mul- 
tiplications. In the process we certainly will develop related properties of ma- 
trices. 


2.1 Vectors and linear equations 


Solving a linear system Ax = b means that we use certain operations on 
the system to reduce it into the form x = c, or equivalently Ix = c, where I 
is the identity matrix. Such operations should be reversible, in the sense that 
the solution x = c should be equivalent to the original system. By default, 
we agree on the fact that if u = v, u, v € R”, then Bu = Bv for every m x n 
matrix B. Using the language of matrix, to reduce Ax = b into Ix = c, we 
need to find a sequence of matrices Ey, Ez, +++ Eq such that 


E, Az = Eb > Fok, Ax = E2 51b > => Eg dit Fok, Ax = Es «++ BoE yb. 


Namely, we keep multiplying both sides of the equation by the same ma- 
trix. The question is at which step we should stop. If the solution is unique, 
that is, the solution x must be a definite value, we should be able to ar- 
rive at the situation that the coefficient matrix of x becomes the identity 
matrix I. That is, E,--:E2£,A = I. In order to make sure the solution 
x = c is equivalent to the original system, ideal candidates for the matrices 
E, Ez, --- Eq are those elementary matrices, because we know their canceling 


15 
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matrices E, *, Ez*, --- E71 such that we have 
Ar=b<E,*E¡Ar=E,'Ejb<=--- © E, Eq: EAr = Ez Eg Eb. 


Example 2.1.1. 
We solve the following system Ax = b, which is in system form: 


xr—y=l1 
z+y=2. 
We follow our basic idea of eliminating variables, and at the same time use 


matrices to represent the elimination processes. We use the notation R; to 
denote the i-th row or the i-th equation. 


Solution: 
System Matrix representation Elementary matrix 
=y=1 1 -—1|fx| | 
z+y=2 1 Lily] [2 
1 0 
| R2 + Rı El = 1 | 
a y=1 1 0¡f1 —1} jx} [1 0/1 
22=3 |1 1||1 1 ||y| fi 1| 2 
1 —1|iz 1 0 1 
mom olaa Pa 


EE 


n-a, A si 
1 0110-30 
mo RAR- ebi 
A BIR 0-69 
mer RAS RE 
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e 1E- E] 


Indeed, in the process we have Es E4 E3 E2 E A = I and the solution is obtained 
when an identity matrix appears in the last step. Let us also observe that the 
so far undefined matrix multiplication between 2 x 2 matrices actually can be 
done column by column on the second matrix. (Please verify it in the 
computation.) For general matrices, we will follow this convention which will 
formally become our definition of matrix multiplication — we are actually 
justifying how convenient it is for solving linear systems. 

Let us observe that during the elimination process in Example 2.1.1, if the 
first step was R2— R1, then interchanging of two rows in the second step could 
have been avoided. The upper leftmost nonzero entry 1 in row 1 is called a 
pivot or a leading 1. One could use this pivot to eliminate all entries below 
it. If we have a pivot in each row during the elimination, we can reduce every 
nonpivot entry of a square matrix into zeros. oO 


Notice that the variables (x, y) in the matrix representation are actually 
unnecessarily carried in each step. We use the so-called augmented matrix, 
which is the coupling of the coefficient matrix and the right hand side of 
the system, to represent the system by a single matrix, completely dropping 


(z, y). 
Example 2.1.2. Solve system (1.1): 
z+2y+32=3 
2x + 5y +8z=9 
3x + 6y + 18z = 18. 


Solution: We re-write the system of linear equations in the matrix form 
Ax = b, where 


1 2 3 x 3 
A= ]2 5 8|,x=|y| andb= | 9 
3 6 18 z 18 
Then the corresponding augmented matrix is 
1 2 3 3 
[A:b]=|2 5 8 9 
3 6 18 18 


By the elementary row operations on [A : b] we have 


123 3 12 3 3 
9 5 8 9| lo 1 2 


3 6 18 18 3 6 18 18 


w 
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mican 1 233 
ZE 10 1 2 3 
0099 

e be A 
= J0 1 2 3 
0011 

1 2 3 3 
A jo 1 0 1 
0011 
sam 20 
A lp 1 0 1 
0011 
cam [LE DO 
= 10 1 0 1 
001 1 


Then we have an equivalent system with augmented matrix 


The solution is 


o 


We remark that in the solution of Example 2.1.2, the coefficient matrix 
has been reduced into an upper triangular matrix (the entries below the main 
diagonal are all zeros) after two eliminations. Once an upper triangular ma- 
trix is obtained, we can use back substitution to solve for z, and y and zx. 
The remaining steps are eliminating the entries above the pivots so that we 
obtain an equivalent system with a diagonal/identity coefficient matrix whose 
solution will be directly displayed. 

The following example deals with the situation that the solution is not 
unique, but we still carry out the elimination process until we arrive at the 
situation that the maximal number of variables has coefficients 1. 


Example 2.1.3. Solve the following system of linear equations. 


z1 + £2 + £3 + 2x4 = —1 
2x1 + T2 +33 + z4 = —2 
221 NN 2% + 4x3 la 2x4 = 3. 
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Solution: We re-write the system of linear equations in the matrix form 
Au = b, where 


1 112 —1 
A=|2 1 3 1|,u=|?| andb= |-2 
2 —2 4 2 A 23 
T4 
Then the corresponding augmented matrix is 
E TADA 
Mb 1 3 1 -2 
De 2 Ae SD că 
By the elementary row operations on [A : b] we have 
1 112 PEE 1 1 2 ei 
2 1 3 1 -2| # >| -1 1 -3 0 
A g Rae NG Aa = cai 
a do l al 
= 0 1 -1 3 0 
n= e De i 
sho sue O STA, 2a 
e jo 1-1 3 0 
R3+(4)R2 00 -2 10 —1 
1 0 2 -1 -1 
BD aed 23% 0 
00 1-5 3 
100 9 -2 
us lp 1 0 —2 ¿2 
Ra+R3 00 1 5 i 


Then we have a system with augmented matrix 


100 9 —2 
010 -2 3 
001-5 4 
Let x4 = t, where t is an arbitrary real number. The solution is 
27 E — 9t 
> + 2t 
T2 
Pais 3 „teR. 
3 5 +5t 
Ta t 
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Notice that the reduced system does not have an identity coefficient matrix, 
even if it is close to it. We say such a matrix is in reduced row echelon 
form, which is a matrix satisfying the following: 


i) If a row is not entirely zero, the first nonzero entry is 1, which we 
call the pivot 1 or leading 1; 


ii) The zero rows are exchanged to the bottom; 


iii) For every two pivot 1’s (leading 1’s), the one in the higher row is 
closer from the left of the matrix; 


iv) Each column has a leading 1 and has zeros everywhere else in the 
column. 


A matrix satisfying conditions i), ii) and iii) is said to be in row echelon 
form. The elimination process to obtain a row echelon form is called Gaussian 
elimination. The elimination process to obtain a reduced row echelon form 
is called Gauss—Jordan elimination. 


Example 2.1.4. Consider matrices 


100 9 -2 100 9 -2 
ENO 0 1 -5 i| a: Ooh. 1 4 
000 1 4 0 0 1 -5 «E 


A is in row echlon form, but B is not because the pivot in row 2 is farther 
from the left of the matrix than the pivots in row 3. The pivots should be 
positioned in the matrix in a staircase shape. 


1 x 


RA 


0) 1 


El 


The following example illustrates how we determine a system has a unique 
solution, has no solution or has infinitely many solutions. 


Example 2.1.5. Let k be a real number. Consider the following linear system 
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of equations: 


t2+%3+%4=-1 
2%, + £2 + 323 + T4 = —2 
221 = 213 + 4x3 + 224 =-3 


(2.1) 
322 — £3 — £4 = k. 


Find all possible values of k such that system (2.1) has a unique solution, has 
no solutions and has infinitely many solutions. 

Solution: We re-write the system of linear equations in the matrix form 
Au = b, where 


0 1 1 1 21 -1 

su [2 1 3 1 _ aa _ |-2 
A= 2 9 4 o| u= de and b = 23 
0 3 -1 -—1 La k 


Then the corresponding augmented matrix is 


0 1 1 1 -—1 
2 1 3 1) 2 
pal 2 —2 4 2 —3 
0 3 ok ii k 


By the elementary row operations on [A : b] we have 


0 1 1 1 -—1 
2 1 3 1 —2 
2 -2 4 2 -3 
0 3 —1 -l k 
2 1 3 1 —2 
R3% R2 0 1 1 1 —1 
2 —2 4 2 -3 
0 3 —1 -l k 
2 1 3 1 —2 
R3+(—1)Rı 0 1 1 1 —1 
Ra—3R2 0 —3 1 1 -1 
0 O -4 -4 k+3 
2 0 2 0 —1 
Rı— R2 0 1 1 1 —1 
R3+3R2 0 0 4 4 —4 
0 0 -4 —4 k+3 
2 0 2 0 —1 
Ra+R3 0 1 1 1 —1 
004 4 —4 
000 0 k-1 
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2 0 2 0 —1 
R3/4 0 1 1 1 —1 
0 0 1 1 —1 
0 0 0 0 k-1 
1010 -4 
Rə— R3 0 10 0 0 
lr Woar ea 
000 0 k-1 
1 0 0 -1 > 
Rı— Rs 0 1 0 0 0 
0 0 1 1 -1 
0 0 0 0 k-1 
Then we arrive at 
1 0 0 -1 > 
0 1 0 0 0 
0 0 1 1 —1|? 
0 0 0 0 k-1 


which can immediately lead to the reduced echelon form of the augmented 
matrix [A : b] if we know the value of k. 


i) If k #1, then the system is not consistent and has no solution because the 
last equation is 0 = 1 which is contradictory. 

ii) If k = 1, then the system is consistent and has infinitely many solutions. 
Let x4 = t, where t is an arbitrary real number. The solution is 


Ti +t 

Ta 0 

a -t „teR 
T4 t 


Exercise 2.1.6. 
1. Redo Example 2.1.1 with the first elementary row operation Rə — Ri. 


2. Determine whether the following matrices are in reduced row echelon form 
and row echelon form, respectively: 


1 0 0 9 —2 100 9 1 1 0 1 
0 1 ae PR AOE ıl, [0103 
0 0 1 -5 3 0 0 1 -5 001 4 


3. Solve the following systems using Gauss—Jordan eliminations: 


r+3z=1 2+2y+32=1 2+2y+32=1 
a) 2xe+3y=3 b) 2a+3y+4z=3 Cc) 21 +3y+42=3 
4y+5z=5 3a + 4y+5z=5 5x + 9y+13z2=7 
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A. Consider a linear system Ax = b with A an m x n matrix and b an m x 1 
matrix. Is it true if there is more than one solution for x in R”, there must be 
infinitely many? You may use the fact that 


A(x +y) = Az + Ay 


A(tx) =tAz, 


for every x, y ER” andteR, 


which is called the linearity of matrix multiplication. 
5. Let k be a real number. Consider the following linear system of equations: 


£2 + 2£3 + z4 = 1 


221 T2 323 =2 
11 + 4x3 + 214 = 3 
kx + Ia = 1. 


(2.2) 


Find all possible values of k such that system (2.2) i) has a unique solution; 
ii) has no solutions and iii) has infinitely many solutions. 


2.2 Matrix operations 


We have dealt with matrix multiplication when we represent systems of 
linear equations into matrix form Ax = b, where Ax is a m x n matrix times 
an n x 1 matrix. In this section, we discuss matrix operations for general 
matrices. 


Definition 2.2.1. Let A, B be mx n matrices, c € Ra scalar. Then A+ B 
and cA are m x n matrices defined by 


(A+ B)ij = (Ai + (Big, (cA)ij = c(4);j, 


where (A); denotes the entry at the (i j) position of A. 


Example 2.2.2. Let A = E 2 al B= k E ik Then we have 


345 EE 
6 6 6 24 6 
a+8= | 6 ls 24 = [e 8 fale 
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Recall that in Example 2.1.1, we multiplied square matrices with the con- 
vention that the first matrix multiplies column by column on the second matrix 
of the product AB. One point to note is that a product Ax with a column 
matrix x exists if the number of columns of A equals the number of rows of 
x. 

We have 


Definition 2.2.3. Let A be an m x n matrix, B be an n x r matrix. Then 
AB is an m x r matrix. 


AB = [Aba : Aba : --- : Abr], 
where B = [by : bg: +--+: br). 
1 2 5 4 3 
Example 2.2.4. A= |; |. 2=|5 ) {|+ Then we have 
11 8 5 
aB= ly 20 at 


O 


Let A be an m x n matrix, B be an n x r matrix. If we partition A into 
rows and B into columns, we have 


ai 
a2 
AB=|. [bi : ba : +-+: br] 
am 
a,b, a1b2 410», 
| Gaby agbe azbn 
ambı amb2 amb a 
from which we have 
aj B 
as B 
AB = i ; 
AmB 


and that 
(AB), = (Rowi of A) - (Column j of B). 


We have the following associative property of matrix products. 


Theorem 2.2.5. Let A, Band C be m x n, n x p and px q matrices, 
respectively. Then we have 


(AB)C = A(BO). 
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Proof. First of all we know that both products are m x q. For every position 
(i, j) in the product matrix, we have 


((AB)C)i; = (Rowi of AB) - (Column j of C) 


=> [(Rowi of A) - (Column s of B)|C; ; 


ate 


sj 


Romi E A): (Column j of BC) 
= (A(BC))i j 


o 


By the approach of comparing corresponding positions in the related ma- 
trices, we have the following distributive properties: 


Lemma 2.2.6. Let A and B be m x n; C and D be n x p matrices; 
t E€ R be a scalar. We have 


A(C + D) = AC + AD; 


(A+ B)C = AC + BO; 
t(A+B)=tA+tB. 


Unfortunately there is no commutative property in general for matrix prod- 
ucts. 


In general, 


which illustrates that AB + BA. Observe that B is an elementary matrix 
which exchanges rows of A if multiplied on the left of A, but exchanges columns 
of A if multiplied on the right. We will understand this after we have learned 
transposition in the later sections. o 
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We finish this section with the definition of matrix powers: 


AAA A 
HE 


m copies of A 


Exercise 2.2.8. 


1 2 


1. Let A= ; 4 


| B= + i E il i) Compute AB. ii) Does BA exist? 


1 2 1 -1 0 1] . za a 
2. Let A = E Al B= i i al i) Compute AB. ii) If B is block 


partitioned into B = [Bu : Ba], is it true AB = [AB, : AB]? 
3. Show Lemma 2.2.6. 


4. Let A = [as : a2- : an], B= | . | be m xn and n xr matrices. Show 
bn 
that AB = aıbı + az2b2 +++ AnD» - 


1 
5. Let A= | 4 1 1190 


‘| B= i eig i . Use Question 4 to compute AB. 
6. Let A and B be m x n and n x r matrices. Show that i) every column of 
AB is a linear combination of the columns of A; ii) every row of AB is a linear 
combination of the rows of B. 


: | . Find all matrices B such that AB = BA. 


7. Let A= E 4 


8. Let A and B be n x n matrices. Explain that in general we have (A — 
B)(A — B) + A? — B? and (A+ B}? + A? +24B + B2. 


9. Let A be an n x n matrix. Define V = {B : AB = BA). Show that i) 
V #0; ii) if Bı € V and Ba € V, then every linear combination of Bı and Ba 
is in V. 


10. Give an example that 4? =0 but 4%0. 


11. Give an example that A? = but A £ +I. 


12. Let A be an n x n matrix. If we want to define a limit lim,» A”, how 
would you define the closeness (distance) between matrices? 
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2.3 Inverse matrices 


To solve the linear system Ax = b where A is an n x n matrix and bis an 
arbitrary nx 1 vector, we use the elimination process to reduce it into Ix = xo, 
where I is the n x n identity matrix. We know each step of elimination can 
be represented by a left multiplication of an elementary matrix on both sides 
of the current linear system. That is, 


EmEm-1 Esa EE, Ax = EmEm-1 wey E Fb. 


If the eliminations reduce A into the identity matrix J, that is, 
Em Emo": Ex EA = I, we obtain the solution  = EmEm-1:** Ex E Ab. 
For notational convenience, we write B = E Em-1:** E2E1. We have 


BA = I and = Bb. 
If we bring x = Bb back into the original system, we have 
ABb =b. 


Since b is assumed to be arbitrary, we have ABb = b for every b. Then we 
have 
AB =T. 


In summary, in order to have a unique solution for Ax = b with arbitrary b, 
we need 


BA =I = AB. (2.3) 


We notice that 


Lemma 2.3.1. If there exists an n x n matrix B satisfying BA = I = 


AB, then it is unique. 


This is because if CA = I = AC we have 


C = CI = C(AB) = (CA)B = IB =B. 


Definition 2.3.2. Let A be an n x n matrix. If there exists B such 
that BA = I = AB, A is said to be invertible. B is called the inverse 


of A and is denoted by A-1. If there is no such matrix B satisfying 
AB = I = BA, A is said to be singular, or not invertible. 


An immediate application of the definition of inverse is that if a matrix 
has a zero row or zero column, it cannot be invertible. Say A has a row of 
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zeros, then by matrix multiplication, for every matrix B such that AB exists, 
AB will have a row of zeros, which cannot be equal to the identity matrix. 


If A has a zero row or zero column, A is not invertible. 


Example 2.3.3. 1. Since J? = II = I, the identity matrix is invertible 
with 17! = T. 


2. Let A = diag{d1, da, --- „dat. If didg---d, + 0, then A is invertible with 
A! = diag[d; *, dy", --- ,d,,*y. To be more specific, if dida --- dy # 0, 
then we have 


di d! 


3. All elementary matrices are invertible. For instance, 


0 1 0 0 
0 =p e 0 1 0 ) 
0 1 


= 
l 
oom PRO NOIR 
oro 20r ono 
o 
O 
el 
. 
l 
O O m 
= 09 
l 
D 
= 


Now we show two equivalent conditions for invertibility of a matrix. 


Theorem 2.3.4. The following are equivalent: 
i) A is invertible. 


ii) Ax = 0 has only the trivial solution x = 0. 


iii) A is equal to a product of elementary matrices. 


Proof. i) = ii) Since A is invertible, the inverse A~! exists. Then Ax = 0 
leads to A7+Ax = A“!0 and Iz = x = 0. 
ii) > iii) If Ax = 0 has only the trivial solution x = 0, then by Gaussian 
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elimination, the augmented matrix [A : 0] can be reduced to [I : 0]. That is, 
there exist elementary matrices Fi, F2,--- , Em such that 


EmEm-—1 sas EE; [A : 0] = [I : 0]. 


Hence EmEm-1:**EzE¡A = I and we have A = E *E,*...E,,!. Since 
E; l i = 1,2,:::,m are again elementary matrices, A is a product of ele- 
mentary matrices. 

iii) = i). Suppose A is a product of elementary matrices with A = 


ELLE: Ex. Let B= EmEm-1**: EsE1. We have 
BA =(EmEm-1 Bo EEE... Ba) 


=] 
=(E'Ez*- Ep )(Em Ema: E2E4) 
=AB. 


Theorem 2.3.5. Let A and B be n x n matrices. 


i) If BA = I, then B = A™!. 
ii) If AB = I, then B = Aq" 


Proof. i) If A is invertible, we immediately have B = B(AA-1) = (BA)A™! = 
A-1. To show A is invertible by Theorem 2.3.4, we show that Ax = 0 has only 
the trivial solution. Indeed, if Ax = 0, we have BAz = BO which implies that 
Ix = x = 0. Ax = 0 has only the trivial solution. Hence A is invertible. 

ii) By 1), we have A = B7! and B is invertible. Notice that BB"! = 
B-1B = I. B- is invertible with inverse equal to B. Therefore, A is invertible 
with A`! = Bote B. That is, BA, o 


Theorem 2.3.6. Let A and B be n x n matrices. Then A and B are 


invertible if and only if AB is invertible. 


Proof. “=>” If A and B are invertible, 47! and B7! exist. Then we have 
(B-1A7})(AB) = (ABE 140) = 1, 


which imply that AB is invertible. 
“<—” If AB is invertible, then there exists a matrix C such that C(AB) = 
(AB)C = I. By associative property of matrix product, we have 


(CA)B = A(BC) =I, 
in which by Theorem 2.3.5 A and B are both invertible. o 
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Lemma 2.3.7. Let A = E a 
c d 


A is invertible if and only if ad — bc # 0. 


If A is invertible, then 


Proof. We show the first part. The second part can be directly verified by 
computing AA-L. 

“>” Tf A is invertible, it cannot have a zero row. Therefore, c and d 
cannot be simultaneously zero. Suppose ad — bc = 0, then the system Ax = 0 
has a nontrivial solution 

d 
x= #0. 
—c 


By Theorem 2.3.4, A is not invertible. This is a contradiction. Hence ad— bc + 
0. 
“<—” Tf ad — bc + 0, the matrix 


1 d —b 
B = 
ad — bc É a | 
exists and satisfies BA = I. Therefore, A is invertible with A7! = B = 


i [da =h 


ad—be —c a 


O 


From Lemma 2.3.7, we know that the invertibility is determined by a scalar 
quantity ad — bc. We call it the determinant of A. In the later chapters we 
will come back to this notion. 


Example 2.3.8. (Dominant matrices are invertible) We call an n x n matrix 
A = (aij) a dominant matrix if for every i € (1, 2, --- ,n), we have 


lanl > XC lasl. 
jti 
We show that if A is dominant, then it is invertible. 


Proof. We show that if x + 0 then Ax + 0 so that Theorem 2.3.4 applies. Let 
|£io| be the largest coordinate of x in absolute value. Then we have 


|(Az)ioa| = |(Row 10 of A) -x| 
=|dio1 £1 + 0202 +: + Qioio Lig F*** + AignXn| 
>|Qigig Tio] — |@ig1 Ti + 0ip2%2 +: + Qio(io—1)Tio—1 


+ Qio(iot1)Zio+1 * ** ign Zeal 
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>| Bigg Xig| — lasiri].— lajara] = lar, Pia] 
= |[Qio(io+1)Zio+1] AA [dionn] 

Geeta | dea eg a te [aio lio—1) Tiol 
ke [aro (+1) Pio | ++ — |Qjonto] 


= | [aisil = Y laij] | [tial > 0. 


jzio 
O 


We finish this section by a concrete example on how to find the inverse of 
a square matrix, if it exists. The idea is the same as solving the linear system 
by elimination. If we can find elementary matrices E1, Ez, -+-+ , Em such that 
EmEm-1-** ELA = I, then we obtain A~! = En Em_1:-: E1. The problem is 
that it is not economical that we compute the matrix product after we have 
found all of the elementary matrices. We must find a device to record the 
product at the same time of elimination. Indeed the coupled matrix [A : I] 
serves this purpose very well. Namely, we have 


AAs I] = AA ASH SEAS 


The inverse 47? is recorded at the second part of the coupled matrix when 
the first part becomes I. 


Example 2.3.9. For the given matrix A, we use elimination to find A”! and 
record elementary row operation and the corresponding elementary matrix at 
the same time. 


Solution: 


2 1 4 1 0 Row operation Elementary Matrix 


0 0 1 
y Ri o R3 E¡=|0 1 0 
1 0 0 
1 -1 2 0 0 1 
2 1 4010 Row operation Elementary Matrix 
3 07100 


1 0 0 
y Ra — 2R Es=|-2 1 0 
001 
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1 -1 2 0 0 1 
0 3 0 0 1 —2 Row operation 
3 0 7 1 0 0 
y Rs — 3Rı 
1 -1 2 0 0 1 
0 3 0 0 1 -2 Row operation 
0 3 1 1 0 —3 
4 Ra — Ra 
1 —1 2 0 0 1 
0 30 0 1 —2 Row operation 
0 0 1 1 -1 -—1 
y Ra/3 
-1 2 0 0 1 
1 0 0 3 -2 Row operation 
0 1 -1 -1 
y Ri + Ro 
1020 4 ¢ 
0 1 0 3 -2 Row operation 
0 0 1 1-1 -—1 
4 Ri — 2R 


1 0 0 -2 ] 
010 0 
001 1-1 -1 


Then we have 


7 7 

-2 3 3 

—1 1 2 
A =| 0 3 =š, 
tit: ot 


Elementary Matrix 


Elementary Matrix 


0 
0 


1 
Es = |0 
0 1 


our O 


Elementary Matrix 


by 

a 

l 
oo KF 
Corr 
= OO 


Elementary Matrix 


which can be written as the product of elementary matrices Ey Es Es E4 E3 Ea E1. 


O 
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Exercise 2.3.10. 


1. Determine whether or not the following matrices are invertible. Find the 
inverse of each matrix if it exists. 


obo Dad bd 


2. Determine whether or not the following matrices are invertible. Find the 
inverse of each matrix if it exists. 


1 0 0 1 0 0 
0|, b) 2 1 0j, cd) 0 6 0 
0 0 3 1 0 0 1 
3. Determine whether or not the following matrices are invertible. Find the 


inverse of each matrix if it exists. 


1 2 0 1.00 1 2 0 
a) 3 4 0|, b) 0 -1 2], oc) 3 6 0 
0 0 1 0 3 6 0 0 1 
4. For the given matrix A, use elimination to find A~! and record each ele- 


mentary row operation and the corresponding elementary matrix at the same 
time. 


3 0 1 
A= |2 4 2 
1 -1 A 
invertible? 
TI 
T2 
6. Let A be an n x n matrix. If A= |. | satisfies that ro = r3 + rı, is A 
Tn 
invertible? 
7. Let A be an nxn matrix. If A = [c Co: En] satisfies that co = c3+c1, 


is A invertible? 


8. Let v, w € R” be vectors. Is the matrix A = | : | invertible? 


low] [pol] 
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9. Give an example of a 3 x 3 dominant matrix and find its inverse. 


10. Find a sufficient condition on a, b, c and d € R such that the matrix 


i a? + b? 2ab 
| Qed c2 + d? 


is invertible. 


0 1 1 
11. Let A= |0 0 1|.i) Compute A?; ii) Show that for every k > 3, k € N, 
0 0 0 


AŽ =0. 


12. Let A be an n x n matrix. Show that if A* = 0, then J — A is invertible 
and 
(T-A! =I +AA +e + 4771, 


13. Let A be an n x n matrix and A = tI + N, t € R with N* = 0 for some 
k € N. Compute A?* in terms of t and N. 


14. Let D = diag[A1, A2, ++- , Àn} be a diagonal matrix with the main diago- 
nal entries A, A2, ::: „An. Show that D is invertible if and only if A; + 0, for 
every 1=1,2,---,n. 


15. Let A be an n x n matrix. i) If A? = I, find Aq}; ii) If A* = I for 
some k € N, find A-!; iii) If A* = 0 for some k € N, is it possible that A is 
invertible? 


16. Show that A is invertible if and only if A* is invertible for every k € 
N,k> 1. 


17. Let A and B be n x n invertible matrices. i) Give an example to show 
that A+ B may not be invertible; ii) Show that A+ B is invertible if and only 
if A~ + Bo is invertible. 


2.4 LU decomposition 


We have observed in solving Ax = b with A an n x n matrix that once we 
have a triangular coefficient matrix, say Ux = c, we can use back substitu- 
tion to solve the system. Indeed, if elementary matrices E1, F2,--- , Em are 
lower triangular and reduce A into an upper triangular matrix without row 
exchanges, then the product L = E, *E;?-.. En" is lower triangular. In such 
a case, Ax = b is equivalent to 


LUx = b, 
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and we can solve the system by solving two systems with triangular matrices: 
Ic=b, Ux=c, 


which can be solved by forward and backward substitution, respectively. 

Next we explain that L can be obtained without really carrying out the ma- 
trix multiplications. Assume that each elementary matrix in Fi, E2, ++: , Em 
deals with a different position (i, j) of an n x n matrix with i > j and 
we assume that the elimination on A was carried out with the order E, 
E2--, Em, successively on A. Then the product L = E, *E,*...E,,! = 
ELLE... EĻ is acting on I backward adding the —l; multiple of the 
j-th row to the i-th row, where Î;,; is the multiplier from E, j. With p < q, 
the row with the position to be altered by E, l is not altered (relative to the 
identity matrix) when Ez l acts. To be specific, let us examine the following 
example: 


Example 2.4.1. Let 


2 -1 0 1 0 0 1 0 0 
A=|-1 2 -1|,Ea=|5 1 0|,E3=|0 1 0 
0 —1 2 0 0 1 0 2 1 
We have 
2 —1 0 
E32EnA=U=|0 3 -1 
4 
0 0 3 
Then the LU decomposition is A = LU with 
1 0 Of |1 0 0 1 0 0 
L= Ep Eg =|-3 1 0] [0 1 0|=|-% 1 0 
0 0 1; |0 -% 1 0 -2 1 
We see that the multipliers —l3 2 = 3 and loi = 3 of the elementary 


matrices are placed directly into the identity matrix to form the product L. 
Row 2 for the position (2, 1) which will be altered by E3;' is not altered when 
Es sends —lz,2 to the (3, 2)-position. When Ei acts after E33, it sends 
—l2 1 to the (2, 1)-position using unchanged Row 1 to produce a new Row 2, 
but this new Row 2 has no effect on ES. anymore. 
If we compute 
EEN = |- 


bi 


0 0 
1 0 
2 

= 1 


wle DIR = 


E; produces a nontrivial entry at (2, 1) in Row 2. When Ez,' acts, it uses an 
already altered Row 2 and produces an unwanted entry at (3, 1). In summary, 
we have 
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If E1, Ez, --- , Em are lower triangular elementary matrices which op- 
erate on distinct positions (+, j), i > j of the n x n matrix A and are 
such that E, Emo: :- Es EA = U is an upper triangular matrix, then 
ro a= vie ti e ep a 


| is lower triangular with 


(Dis = hig, 


and lij is the multiplier of the corresponding elementary matrix in 
E, E», -:- , Em which operates on position (i, j). 


Exercise 2.4.2. 


1. Let —l;j be the entry of the 4 x 4 Ey matrix below the main diago- 
nal. Which one of the following products can be obtained by directly writ- 
ing —l;; into the (i, j) position of the products? i) E37 Po En Ep Ezg; ii) 
Es En Ex) Ey Ex - 


2. Find the LU decomposition of 


3 0 1 
A= |2 4 2 
1 -1 5 
3 0 1 
3. Let b = (1, 2, 3)and A= |2 4 2|. Use the LU decomposition of A to 
1 -1 5 


solve system Ax = b. 


4. Is it true that a matrix A does not have an LU decomposition? Justify your 
answer. 


2.5 Transpose and permutation 


Definition 2.5.1. Let A be an mx n matrix. The transpose of A is an n x m 
matrix denoted AT and is defined by 


(AD = (A)ji- 
Example 2.5.2. We have the following examples. 


1 
1. If A=[1 2 3], then 47 = |2 
3 


Solving linear systems 37 


2. If A= 


NBR 
00 at N 


3 147 
6| then AT=|2 5 8 
9 3 6 9 


3. If z, y € R” are treated as n x 1 matrices, then 2 - y = x7 y. 


Theorem 2.5.3. Suppose that A and B are matrices such that A+ B, 
AB and A”! exist. We have 


(Ad ee 
(AB)? =BT AT, 
Gis = (Aine 


We show (2.5) while the others can be proved similarly. For every (i, j) 
position of (AB)! we have 


Next we show (2.6). We need only to show A7(471)7 = I, which is by 
(2.5) immediately true since AT (A-1)7 = (4714)? = IT =I. 


Definition 2.5.4. Let A be an n x n matrix. If A = AT, A is called a 
symmetric matrix. 


Example 2.5.5. 1. Diagonal matrices are symmetric. 


2. If A= 


VW Nye 
AeA aN 


3 
4| , then we have A? = A and A is symmetric. 
6 


3. If A is an m x n matrix, then both AAT and ATA are square matrices 
and are symmetric. 


4. If x, y € R” are treated as n x 1 matrices, then xy” is an n x n matrix, 
but NOT symmetric in general. 
o 


Theorem 2.5.6. Let A be an m x n matrix. Then ATA is invertible 


if and only if the columns of A are linearly independent. 
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Proof. “==>” If the columns of A are not linearly independent, then the system 
Ax = 0, which is equivalent to 


Tı 
T2 
Ag = [es : c2 : +++: Cn] |. | =0 & 2101 + T202 +--+ LpCn = 0, 


Tn 


has nontrivial solutions. Then AT Az = 0 also has nontrivial solutions since 
Ax = 0 implies AT Ax = ATO = 0. By Theorem 2.3.4, ATA is not invertible. 
This is a contradiction. 

“<=” Consider AT Ax = 0. We note that 


Ax L every column of A (or every row of AT); 


Az itself is a linear combination of columns of A (or every row of AT). 


Ax must be orthogonal to itself. Hence Ax = 0. Since the columns of A are lin- 
early independent, x = 0 is the only solution of AT Ax = 0. By Theorem 2.3.4, 
AT A is invertible. O 


LU decomposition of symmetric matrices 


Recall that if an n x n matrix A has n pivots (nonzero), its LU decom- 
position can be written as A = LDU where D is the diagonal matrix whose 
main diagonal entries are the pivots. Then L7*A = DU, where L~! is actually 
the product of all elementary matrices (without row exchange matrix) which 
reduce A into the upper triangular matrix DU. 

Assume that A is symmetric. Then the lower triangular part (below the 
main diagonal) of A is the same as that of (DU)? — noting that DU is the 
remaining upper triangular part of A after elimination! Therefore, by the same 
set of eliminations, we can reduce (DU)? into D. That is 


LI DU) = D, 


which is equivalent to (L-1U7 — I)D = 0. Since the diagonal matrix D has 
all pivots at the main diagonal which are all nonzero, we have 


LOU! -I =0. 


That is, L = UT. To summarize, we have 
El ? 


If A is an n x n symmetric matrix with n pivots and no row exchange 
needed to have the decomposition A = LDU, then we have U = LT 


and 
A= LDIF. 
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1 2 8 
Example 2.5.7. Let A= |2 5 6]. Then A is a symmetric matrix. We 
3.6 7 


have 
12 3 1 2 3 
2 5 6| 27 lo 1 0 
3 6 7 3 6 7 
12 3 1 0 oļfi 2 3 
#3, jo ı ol=lo ı ollo ı ol DE 
0 0 -2 o o <a let 


where D and U denote the diagonal and upper triangular matrices in the last 
equality, respectively. We have £3; E21 A = DU, where 


1 0 0 1 0 0 
Ex =|-2 1 01,£3=|0 1 0], 
0 0 1 -3 0 1 
with 
1 0 0 1 0 0 
Ez =]|2 1 0|, E&3'=]0 1 0 
0 0 1 3 0 1 
1 0 
We have L = Ez Esp = |2 1 0|. Then the LDLT decomposition of A is 
3 1 
1 0 0/1 0 0 1 2 3 
A=|2 1 0||0 1 0 010 
3 0 1j {0 0 -2| [0 O 1 


O 


Remark 2.5.8. So far we have avoided an exchange of rows for LU de- 
composition. However, there are indeed cases where an exchange of rows is 
necessary to reduce a matrix into upper triangular form. For instance, the fol- 
lowing elimination has to have a row exchange to obtain an upper triangular 
form: 


1 2 3 Teds 8 
A=|4 8 10| = |0 0 -2 
0 1 0 0 1 0 
In this example if we do know in advance row 2 and row 3 should be exchanged 


to have a LU decomposition, we could decompose P32A such that Po A = 
LDU. 
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If A is invertible, then there exists a permutation matrix P such that 


PA= LDU where L and U are lower and upper triangular matrices, 
respectively. 


Permutations 


Definition 2.5.9. An n x n matrix P is called a permutation matrix if 
the identity matrix can be obtained by rearranging the rows of P. 


By definition, there are n! permutation matrices of order n. 


Example 2.5.10. Consider 3 x 3 permutation matrices. 


100 0 10 001 100 
i=l0 1 0], Pus 0 01,P4=l06 1 0|,Ps= lo 0 1l, 
001 001 100 0 10 
0 1 0 001 0 10 

10 0 0 10 100 


Notice that a permutation matrix P which represents a single permutation 
is symmetric and the inverse is itself. That is, Pt = P = PT. Let us call 
it a simple permutation matrix. For a general permutation matrix E which 
represents multiple permutations, it is a product of simple permutation ma- 
trices. Assume that E = Pı P2P3---P,, where P;, i = 1, 2,---,n are simple 
permutation matrices. Then we have 
ET =(P, P2P3 +++ Py)? 

SPI PE PP 

=P, Pau: Pi, (2.7) 
which is not equal to E anymore. Therefore E may NOT be symmetric. How- 
ever, by (2.7) we have 

E`! =(P, PoP; +- Pa) t 

Sf ek 

= Pa Pro: Pi 

=F". 
That is, if E is a permutation matrix, then EET = I = ETE which imply 
that 
1 ifi=j, 


Row i of E) - (Row j of E) = 
ouă luă iau) $ ifizj, 
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and 
; ; 1 ihe, 
(Column i of E) - (Column j of E) = MIC 
0 ifizj. 
The rows (columns) of E are orthogonal to each other. O 


Definition 2.5.11. If A satisfies that 47! = AT, we call it an orthogonal 
matrix. 


If P is a permutation matrix, it is an orthogonal matrix. That is, 


je et 


1 2 3 0 1 0 
Example 2.5.12. Let A= |2 5 6|,and P= |0 O 1|. Then A is 
3 6 7 1 0 0 


symmetric and P is a nonsymmetric permutation matrix. So PA is a permu- 
tation of the rows of A, row 1 to row 3, row 3 to row 2 and row 2 to row 1. 
That is 


2 5 6 
PA=|3 6 7 
1 2 3 


Since 1, 5, 7 have to be on the main diagonal, in order to restore symmetry 
from PA, column 1 has to be placed in column 3, column 3 goes to column 2 
and column 2 to column 1. We know that a permutation matrix that multiplies 
from the right of a matrix will manipulate the columns. The aforementioned 
operations can be achieved by a right multiplication of 


00 1 
Q=|1 0 0 
0 1 0 
5 6 2 
That is, PAQ = |6 7 3|. Note that QT = P and PAP” is always sym- 
2" PR 


metric if A is symmetric. Therefore, if the symmetry of a matrix is destroyed 
by multiplication of a permutation matrix, we can restore symmetry from the 
product with a multiplication of its transpose from the other side. o 


001 0 


1 0 0 

Example 2.5.13. Let P = 00 0 

0 1 0 
and hence the inverse can be obtained b 
0 


0 0] 
p-1 = pra |0 0 1 
0 
1 


dl 
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Exercise 2.5.14. 


01.00 
_ {0 0 1 0 f -1 T 
1. Let A= 00 0 |- Find A and A’. 
1 0 0 0 
0 1 0 
2. Let A = i i ; . i) Find AAT and ATA. ii) Determine which one of 
1 0 0 


AA? and A’ A is invertible. iii) If one of AA? and ATA is invertible, does it 
contradict Theorem 2.3.6? 


3. Let 
0 1 2 3] i 2 3 ‘| 
A= 1 2 3 4 Be 012 8 
~ 12 3 4 5|? ~ 13 4 5 6|' 
3.4 5 6 2 3 4 5 


i) Find a permutation matrix Pı such that B = P, A; ii) Find a permutation 
matrix Pa such that A = P>B. iii) Compute P, Pa and PP}. 


4. Let 
rið s 
1234 
4= lo 345 
3457 


Find a permutation matrix P, a lower triangular matrix L and a diagonal 
matrix D such that A= LDL”. 


cosó  —sin 


ARS boe cos 6 


| . Show that Rg is an orthogonal matrix. 


6. Let x € R” with «7x = 1. Define the Householder matrix by 
H = I — 2aa7. 
i) Show that H is an orthogonal matrix; ii) Show that H is symmetric. 


I 


7. Let S = AT 2) „where J is mx m and A is m x n, O the zero matrix. 


Find a block diagonal matrix D and block lower triangular matrix L such that 


S=LDL”. 


8. Show that AA” is invertible if and only if the rows of A are linearly inde- 
pendent. 
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9. We say A is skew-symmetric if 47 = —A. i) Show that if A is a skew- 
symmetric n x n matrix then a;; = 0 for every i = 1, 2, --- ,n. ii) If A is both 
symmetric and skew-symmetric, then A = 0. 


10. Let A be an n x n matrix. Show that i) A+ AT is symmetric; ii) A — 
AT is skew-symmetric; iii) For every square matrix B, there exist a unique 
symmetric matrix Bı and a unique skew-symmetric matrix Bz such that B = 
Bı + Ba. 


11. A matrix is called lower triangular if every entry above the main di- 
agonal is zero and is called upper triangular if every entry below the main 
diagonal is zero. Let A be an n x n invertible matrix. i) Show that if A is lower 
triangular, AT! is also lower triangular; ii) Show that if A is upper triangular, 
then A”! is also upper triangular. 


Taylor & Francis 
Taylor & Francis Group 
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3.1 Spaces of vectors 


We know that the operations addition and scalar multiplication in Eu- 
clidean space R” produce vectors within R”. Namely R” is closed under ad- 
dition and scalar multiplication. In addition, the derived operations, such as 
exchange of order of addition, do not produce a vector different from the one 
before the operation. To be specific, for u, v, w € R”, s,t € R, R” satisfies 
the following properties: 


i) (Commutative property) u + v = v + u; 

ii) (Associative property) (u + v) + w = u + (v + u); 

iii) (Identity element of addition) u + 0 = u = 0 + u; 

iv) (Existence of addition inverse) u + (—u) = 0; 

v) (Associative property on scalars) (st)u = s(tu); 

vi) (Distributive properties on vectors) s(u + v) = su + sv; 
vii) (Distributive properties on scalars) (s + t)u = su + tu; 
viii) (Identity element of scalar multiplication) 1u = u. 


Notice that for different sets we have different definitions of addition and scalar 
multiplication. The properties we described for R” are not anymore for free 
for every sets of objects. 


Example 3.1.1. If we consider the set S of all 2 x 2 invertible matrices, we 
cannot expect that addition of two invertible matrices is again invertible; for 


instance, 
1 0 0 1 
alba of 


45 
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both are invertible, but A+ B is not. That is, invertibility is not preserved 
under addition and the set S does not satisfy the closedness property. o 


However, we do have many occasions that a common property is automati- 
cally preserved under innocent operations without the need to have additional 
verifications, and we expect the same easiness as in R” when working on differ- 
ent sets of objects equipped with their own operations of addition and scalar 
multiplication. We can check the following sets with the questions we asked 
for R”: 

Example 3.1.2. 
i) The set of all m x n matrices; 
ii) The set of all m x m symmetric matrices; 
iii) The set of all m x m skew-symmetric matrices A with AT = —A; 
iv) The set of all solutions of Ax = 0 in R”; 
v) The set of all linear combinations of vectors in {(1, 1, 0), (1, 0, 0)} CR? 
vi) The set with addition identity only: {0}; 
vii) The set of all polynomials; 
viii) The set of all convergent sequences in R; 
ix) The set of all continuous real functions; 
x) The set of all differentiable real functions; 


xi) The set of all Riemann integrable real functions on (a, b]. O 


Definition 3.1.3. Let V be a set equipped with addition + and scalar 
multiplication over a scalar field K (we assume it is either R or C). If 
V is closed under addition and scalar multiplication, and the following 
conditions (A1)-(A8) are satisfied, we call V a vector space (or linear 
space) over the field K. If the scalar field K is R, we call V a real 
vector space. If K = C, we call V a complex vector space. We discuss 
real vector space by default. 


For every 1, y E V, t+y=y+2. 


For every x, y, z € V, we have (£ +y)+2=24 
There exists a 0 € V such that x +0 = x = 0 + z for every x € V. 


For every x € V, there exists w € V such that z + w = 0. 
For every x, y € V and k € K, k(x +y) = kz + ky. 

For every x € V and k, t € K, (k + t)z = kz + tz. 

For every x € V and k, t € K, k(tx) = (kt)x. 


For every x E€ V, lr=x. 
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We know that R” has many subspaces. We also have the notion of sub- 
space of general vector space. 


Definition 3.1.4. Let V be a vector space. A subset E C V is called a 
subspace if it is closed under addition and scalar multiplication. That 


is, for every x, y € E, k a scalar, 


T+yEB, krek. 


We can verify that 


Lemma 3.1.5. If E is a subspace of a vector space V, then (A1)-(A8) 


are also satisfied by E and F itself is a vector space. 


Example 3.1.6. 


Let Maxn denote the vector space of all n x n matrices. Then the set of all 
n x n symmetric matrices is a subspace of Mn xn- 


Let V be a vector space and vo € V. Then {kvo : k € R} is a subspace of V. 
For every vector space V, the set {0} is a (trivial) subspace of V. O 
Example 3.1.7. Let V be a vector space and {u, v} C V. Let E be the set 


of all linear combinations of u, v. Then E is a subspace of V. Indeed, we need 
only to check the closure property for S. 


Closed under addition: For every x, y € S, there exist scalars c1, c2, c4, ch such 
that z = cau + cv, y = chu + cv. Then we have 


x+y = (c1 +c1)u + (ca +), 


which is again a linear combination of u, v. We have x+y E S. 


Closed under scalar multiplication: For every x € S, there exist scalars c1, Ca 
such that x = cyu + cov. Then for every scalar t we have 


tx = t(ciu + cov) = teu + tcov, 


which is also a linear combination of u, v. We have tz € S. By definition of 
subspaces, S is a subspace of V. D 


We call the set of all linear combinations of vectors from a given subset 
S C V the span of S, denoted by span(S). Note that a linear combination of 
vectors is a linear combination of finitely many vectors. By the same token 
of Example 3.1.7, we can show that 


Lemma 3.1.8. If S is a subset of the vector space V, then span( S) is 


a subspace of V. 
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We notice that if we add a linear combination of the vectors from a span- 
ning set, say S = {u, v}, it will not change the span of S. For instance, 


span{u, v} = span{u, v, u +v}. 


If a spanning set S is linearly independent, it has the minimal number of 
vectors to span the vector space span(S). At the same time, S also has the 
maximal number of linearly independent vectors in span(S) because any ad- 
ditional one from span(S) will create redundancy in S. As such, we have the 
following definition, 


Definition 3.1.9. If S is linearly independent, we call the number of 
vectors in S the dimension of the vector space V = span($), denoted 


dim V, and call S a basis of the vector space V = span(S). 


Notice in the definition, a set of vectors S qualifies for a basis of a vector 
space V, if and only if, 


1) S is linearly independent in V; 
2) S spans V. 


Moreover, we notice that the dimension of a vector space is independent 
of a specific basis. Indeed we have the following dimension theorem, 


Theorem 3.1.10. All bases for a vector space have the same number 


of vectors. 


Proof. We consider bases with finitely many vectors. Suppose the vector space 
V has two bases (01, va, -++ , Um} and {w1, wa, +- , Wn} with m > n. Then 
there exists an m x n matrix A = (aij) such that 


[v1, Va, °°" , Um] = lu, wa, °°" , Wim] A. 


Then Ax = 0 has at least one nontrivial solution x = xy since the reduced 
row echelon form of A will always have zero row. Then 


lu, U2, cc: , Ven) To = lua, wa, °°" , Win] Axo =0. 
That is, there exists a nontrivial linear combination of {v1, va, +++ , Um} that 
equals zero vector. {v1, va, ::: , Um} is linearly dependent. This is a contra- 
diction. o 


Example 3.1.11. Let S = {u, v} be linearly independent vectors. Justify 
that {u +v, u — v} is a basis of span(S) and find the dimension of span(S). 
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Proof. We show that the set of vectors {u + v, u— v} is linearly independent 
and spans the vector space span(S). Consider the vector equation: 


ci(u+v) + c2(u — v) = 0, 


and we have (ca + c2)u + (c1 — c2)v = 0. Since u, v are linearly independent, 
we have 


{u +v, u — v} is linearly independent and is a basis of span {u + v, u — v}. 
Since {u +v, u — v} C span {u, v}, we have 


span {u + v, u — v} C span {u, v}. 


Notice that u = ute +42, u= ute — 2. We have 


span fu, v} C span {u + v, u — v}, 


and hence 


span {u, v} = span {u +v, u — v}. 
Therefore, {u+v, u—v} is also a basis of span( S). The dimension of span {u + 
v, u — v} is 2. 
o 


We close this section with an example on how to find the span of vectors 
in R8. 


Example 3.1.12. Find the equation of the plane in R? spanned by vı = 
(1, 1, 0) and v2 = (0, 1, 1). 


Solution: Let S denote the plane. For every (x, y, 2) € S, it is a linear 
combination of vı and vz. That is, the vector equation 


c1U1 + c2w2 = (x, y, 2) 


is always consistent for cı and c2. Applying Gaussian elimination to the aug- 
mented matrix, we have 


1 0 z 1 0 1 0 
1 1 y| =s [Jo 1 y-c] EB lo 1 ye 
0 1 z 0 0 0 0 

Then we have x — y + z = 0, which is the equation of the plane. 


Exercise 3.1.13. 
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1. Which of the following subsets of R? are subspaces of R?? If yes, find a 
basis and the dimension of each of the subspaces. 


i) [(x, y, 2):2+y+2=0); 
ii) L(x, y, z): xyz = 0}; 
iii) {(@,y, 2):21+y+2=1); 
( 
( 


iv) (le, y, z):@=y=2}; 
v) {(z, yz): y =z}. 
2. Let V be a vector space. Show that the zero vector O is unique. 


3. Let V be a vector space. For every x € V, the negative w such that x4+w = 0 
is unique. 


4. Let V be a vector space. For every z € V, 0x = 0. 


5. Find a basis and the dimension of the (A € Mao: AT = —A}, where Maz 
denotes the vector space of all 2 x 2 matrices. 


6. Find a basis and the dimension of the {A € M33 : AT = —A}, where M33 
denotes the vector space of all 3 x 3 matrices. 


7. Show that S = {(1, 1), (1, 0)} is a basis of R?. 


8. Let A be an n x n matrix. Show that V = {B : AB = BA} is a subspace 
of Mnn- 


9. Let V be a vector space. U and W are subspaces of V. Show that U AN W 
is a subspace of V. 


10. Give an example to show that the union of two subspaces may not be a 
subspace. 


11. Let V be a vector space. U and W are subspaces of V. Define U + W by 
U+W={ae+y:xEU, yew}. 
Show that U + W is a subspace of V. 


12. Let u and v be linearly independent vectors in R?. Show that R? = 
span{u, v}. 


13. Show that if the subset S has m vectors in the n-dimensional space V 
with m > n, then S must be linearly dependent. (One may use the proof 
method for Theorem 3.1.10.) 


14. Find the equation of the plane in R spanned by vı = (—1, 1, 0) and 
Va = (0, 1, —1). 


15. We call an (n — 1)-dimensional subspace of an n-dimensional vector space 
V a hyperplane in V. Find the equation of the hyperplane in R* spanned by 
vı = (—1, 1, 0, 0), v2 = (0, —1, 1, 0) and v3 = (0, 0, —1, 1). 
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3.2 Nullspace, row space and column space 


Let A be an mx n matrix. The set of all solutions N(4) = {x € R” : Ax = 
0) is a subspace of R”. Indeed, by linearity of matrix multiplication, for every 
u, v € N(A) and for every scalar t, we have 


A(u+v) = Au + Av =04+0=0, 
A(tu) = tAu = t0 = 0, 


which imply that u +v € N(A) and tu € N(A). 


Definition 3.2.1. Let A be an m x n matrix. The set of all vectors 


N(A) = {x € R” : Ax = 0} 


is called the nullspace of A. 


We can obtain a nullspace by means of elimination. Indeed, if E is a 
product of elementary matrices, then 


EAx=0% Ar =0. 


Therefore, the nullspace of A is not changed after elimination. We can find 
the nullspace using the reduced row echelon form. 


Example 3.2.2. Let E = E s : E . Then in the solution of Ex = 0 with 
x= (£1, Z2, Z3, 24), 11, 13 are leading variables and x2, x4 are free variables. 
Tı —Xa 0 —Za 0 =] 

Za] _ | ta] _ |x Op _ 1 0 
tah Ml Ol POE cal a aae, 

T4 Ta 0 La 0 1 


0 —1 
1 0 
0 1 
0 —1 
and dim N(E) = 2. We call r and i special solutions of Ex = 0, which 
0 1 


constitute a basis of N(E). The number of free variables is the dimension of 
the nullspace. o 


52 Concise Introduction to Linear Algebra 


We know an m x n matrix A is a rectangular array of numbers. Every row 
of A can be regarded as a vector in R” and we call it a row vector. Every 
column of A can be regarded as a vector in R™ and we call it a column vector. 


Definition 3.2.3. Let A be an mx n matrix. The span of all rows of 
A is a subspace of R”, and is called the row space of A denoted by 


R(A). The span of all columns of A is a subspace of R”, and is called 
the column space of A denoted by C(A). 


Example 3.2.4. Let A = e 1 i . Then we have the row space 


R(A) =span{(1, 0, 0), (0, 1, 0)} 
={u € R° : u = x(1, 0, 0) + y(0, 1, 0), z, y E R} 
={u € R°: u= (z, y, 0), 2, y ER) 
=the zy-plane in R3, 


and the column space 
C(A) =span{(1, 0), (0, 1)) 
={v € R? : v = z(1, 0) + y(0, 1), z, y € R} 
={v € R? : v = (x, y), z, y E€ R} 
=R°. 
o 


We have seen in Example 3.2.4 that the row space can be obtained when 
the matrix is in reduced row echelon form (See Example 2.1.3 for the definition 
of reduced row echelon form.) To be specific, 


If a matrix R is in reduced row echelon form, the row space of R is 


spanned by the row vectors with pivots. 


If a matrix is not in reduced row echelon form, we can reduce it into 
reduced row echelon form whose basis for row space is the pivot rows. The 
question is: will the row space be the same after elementary row operations? 
To find out an answer, let us assume we have an m x n matrix A, which has 
been reduced into B by a left multiplication of the elementary matrix E. Then 


we have 
A=EB and B= BEA. 


By matrix multiplication, we have 


Row i of A =(Row i of E) B 
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=(Row i of E) 


Tm 
=(a linear combination of the rows of B). 
Therefore, every row of A is a linear combination of the rows of B. That is, 


R(A) C R(B). By the same token on B = E~1A, we have R(B) C R(A). 
Therefore, R(A) = R(B). 


Lemma 3.2.5. The row space of A is not changed after elementary 


row operations. 


Unfortunately elementary row operations may change the column space of 
a matrix. For example, 


4-2 1 0 1 2 
AR a melo df 
The column space of A is span{(1, 2)}. But after the row operation, the row 
space of EA becomes span{(1, 0)}, which is not equal to span{(1, 2)}. How- 
ever, this inequality does not mean row operations are useless for finding col- 


umn space. Indeed, a row operation E on A does not change linear dependency 
among the columns of A, noticing that Ax = 0 <= EAr = 0, i.e., 


Tı 
[=] 

Aw = [c :¢2:+++: Gn] | . | =0 & EAr =0. 
In 


If the only solution of Ax = 0 is the trivial solution x = (11, £2, +++ ,&n) = 0, 
then all columns of A are linearly independent; so are the columns of EA. 

If Ax = 0 has the nontrivial solution x = (£1, v2,---,@n) + 0, say, 
x = (1, >, 1,0,--- ,0), then we have the following linear dependency among 
columns of A: 


1 
1 - (Column 1 of A) + 53 (Column 2 of A) + 1: (Column 3 of A) = 0. 
Since x is also a solution of E Ax = 0, we have 
1 
1- (Column 1 of EA) + 5 (Column 2 of EA) + 1- (Column 3 of EA) = 0, 


which is the same set of linear combinations for the corresponding columns 
of A. 


54 Concise Introduction to Linear Algebra 


Lemma 3.2.6. The linear dependency of columns of A is not changed 


after elementary row operations. 


Example 3.2.7. Consider the 4x5 matrix A = [a1 | az | a3 | a4 | a5], where 
the columns are 


1 —2 1 1 2 

-1 3 0 2 —2 

ay = „ Q2 = 1}? a3 = 1l? Q4 = 3|> a5 = 4 
1 2 5 13 5 


a) Find a set of vectors in {a1, a2, 43, @4, M5) which is a basis of the column 
space of A. 


b) Find the dimension of the column space of A. 


c) Given the basis of the column space of A which has been obtained in part 
a), write the nonbasis vectors in {a1, a2, a3, a4, 05) as linear combinations 
of the basis vectors. 

Solution: a) By the elementary row operations on A we have 


1-2 1 1 2 1-21 1 2 
—-1 3 0 2 > Rə+Rı fo 1 1 3 0 
0 1 1 3 4) Rı-Rı JO 1 1 3 4 
1 2 5 13 5| lo 4 4 12 3 
1 0 3 7 2 
Ri+2Ro;Ra-Ra, |0 1 1 3 0 
Ra—4Ra 0000 4 
000 0 8 
103 7 2 
R3/(4) 10 1 1 3 0 
000 0 1 
00 0 0 8 
1 0 3 7 0 
Ri-2R3 |0 1 1 3 0 
Ri-3R3 |0 0 0 0 1 
000 0 0 


We note that the leading 1’s are in columns 1, 2 and 5 of the reduced row 
echelon form of A. Then, correspondingly, {a1, a2, a5} is a basis of the column 
space of A. 

b) Since (a, @2, a5} is a basis of the column space of A, the dimension 
of the column space of A is 3. 
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c) Let c1, co, ca, ca, cs be the columns of the reduced row echelon form of 
A which is derived in part a). Then we have 


c3 = 3c1 + co, C4 = Tea + 3. 


Since elementary row operations do not change linear dependence of the 
columns of A, we have 


az =3a1 +0, a4 = 701 + 302. 


Exercise 3.2.8. 


1. Let 
1 4 9 1 
A=|2 5 1 -1 
370 3 


i) Find the columns of A which constitute a basis of its column space; ii) Write 
the nonbasis columns of A into linear combinations of the basis columns; iii) 
Find the nullspace of A and determine a basis. 


2. Find the rows of A (not its reduced row echelon form) which constitute a 
basis of its row space, where 


ROPHR 
R uN 
WO ww 


-1 


3. Show that N(A) = N(EA) if E is invertible. 
4. Construct an example of a matrix A and an elementary matrix E such that 


C(A) + C(BA). 


5. Let f : R” — R” be a function defined by f(x) = Ax where A is anm xn 
real matrix. i) Show that the range of f is the column space of A. ii) Show 
that f is a linear function. (See Exercise 2.1.6 for the definition of linearity.) 


6. Let A be an m x n real matrix. Find a linear function g : R™ > R” such 
that the range of g is the row space of A. 


7. Let f : R” — R” be a linear function. Show that there exists a unique 
m x n real matrix A such that f(x) = Az. 


8. Let f : R” > R” and g : R” > R” be a linear function. i) Show that 
gof: R” — R” is also a linear function; ii) Find a matrix C such that 
(go f)(x) = Cz for every x € R”. 
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9. Let V be a real vector space and u, v € V. Show that 
span{u, v} = span{3u + 2v, 4u — 5v}. 


10. Let V and W be real vector spaces and A a subset of V. Let f(A) be the 
image of A under the function f : V > W. That is, f(A) =4f(x) : x € A}. 
Show that if f is linear then f(span(A)) = span(f(A)). 


3.3 Solutions of Ax = b 


Now we turn to discuss the solution structure of Ax = b. We examine the 
following example: 
1 0 0 1 


0010 
We have the augmented matrix 


Example 3.3.1. Let E = | b 
2 


| and consider Ex = b with b = H A 


Then in the solution x = (x1, £2, £3, z4), 11,13 are still pivot (leading) vari- 
ables and x2, x4 are free variables. 


Ti bi — Ta bi 0 =A by 0 —1 
Lo _ TA _ 0 12 0 = 0 1 0 
Ta E ba E ba + 0 + 0 = ba + ds 0 + Sa 0 
Ta Ta 0 0 Za 0 0 1 


We notice that x» = (b1, 0, ba, 0) is a particular solution when we set the free 
variables zero, and £n = 22(0,1,0,0+x4(—1,0,0,1). Namely the solution can 
be written £ = £p + Tn. O 


It is not by chance we have the phenomenon in Example 3.3.1 that a 
solution of Ax = b is the sum of its particular solution x, and a general 
solution £n of the homogeneous system Ax = 0. 


Theorem 3.3.2. x is a solution of Ax = b if and only if x= £p + £n, 


where £p is a particular solution of Ax = b and zx, a solution of the 
homogeneous system Ax = 0. 


Proof. “E” If £ = £p+ £n, we have Ax = A(£p+ £n) = Atp+ Azn = b+0 = b. 
That is  = £p + £n is a solution. 

“>” If x is a solution of Ax = b, then for every particular solution xp, we 
have A(z — £p) = Ax — Az, =b—b=0. That is, £n = x — Zp is a solution of 
the homogeneous system Ax = 0. O 
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By Theorem 3.3.2, we know that every two solutions of Ax = b differ by 
a solution of the homogeneous system Ax = 0. Hence the general solution of 
Ax = b is a particular solution of Ax = b plus the general solution of Ax = 0. 

There is one more question remaining: when is Ax = b solvable? Recall 
that Az is just a linear combination of the columns of A, we have 


Theorem 3.3.3. Ax = b is solvable if and only if be C(A). 


In terms of augmented matrix, Ax = b is solvable if and only if the last 
column of the reduced row echelon form of [A : b] is not a pivot column. We 
rephrase Theorem 3.3.3 as 


Theorem 3.3.4. Ax = bis solvable if and only if rank(A) = rank([A : 


b]), where rank(A) is the dimension of the column space of A, which 
equals the number of pivots. 


Example 3.3.5. (Example 2.1.5 revisited.) Let k be a real number. Consider 
the following linear system of equations: 


z2 + £3 + £4 = —l 


211 + £2 + 323 + £4 = —2 
2x1 — 2x2 + 43 + 244 = —3 


319 — £3 — £4 = k. 


(3.1) 


Find all possible values of k such that system (3.1) has a unique solution, has 
no solutions and has infinitely many solutions. 

Solution: We re-write the system of linear equations in the matrix form 
Au = b, where 


0O 1 1 1 21 -1 
„| d 3 4 23 -2 
APER a | et | 
0 3 A i za k 


Then the corresponding augmented matrix is 


0 1 1 1 -l 
2 1 3 1 -2 
jas b= 2 -2 A 2 —3 
0 3 —1 -1 k 


0. 1 1 1 -1 1 0 0 -1 > 
2 1 3 1 —2| eliminations |0 1 0 0 0 
2—2 4 2 OL 1 -1 
0 3 -1 -1 k 000 0 k-1 
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from which we can tell that rank(A) = 3, but rank([4 : b]) depends on the 
value of k. 


i) If k #1, rank([A : b]) = 4 + rank(A). By Theorem 3.3.4, system Ax = b has 
no solution. From the point of view of Theorem 3.3.3, system Ax = b has no 
solution since b g C(A). Recall that elementary row operations do not change 
linear dependency among the columns of a matrix. 


ii) If k = 1, rank([A : b]) = 3 = rank(4). By Theorem 3.3.4, system Ax = b 
has at least one solution. Note that the homogeneous system Ax = 0 has a 
nontrivial nullspace because rank(A) = 3 is less than the column number 4 
of A and there is a free variable for the solution. Hence system Ax = b has 
infinitely many solutions. Let 24 = t, where t is an arbitrary real number. The 
solution is 


Ly +t > 1 
Tal 0 = 0 0 
ae Z |-1- 7l +t 1 „teR, 
T4 t 0 1 
where £p = (3, 0, —1, 0) is a particular solution of Ax = b and zn = 


t(1,0,-1,1),t € R is the general solution of the homogeneous system 
Ax = 0. 
o 


Exercise 3.3.6. 


1. Let the following matrices be the augmented matrices [A : b] of the system 
Ax = b. i) Determine whether the system is consistent or not. ii) Find all 
possible solutions if Ax = b is consistent. iii) If Ax = b is consistent, write b 
into a linear combination of the columns of A. 


1 357 1 367 
a) baze ajio zeh ola 005 
0125 012 5 
2. Let A be an n x n matrix. Show that Ax = b has a unique solution if and 
only if A is invertible. 


3. Let 
1 3 5 7 1 
302 6 2 
Aa 012 5 > 3 
3 0 3 12 k 


Find conditions on k € R such that Ax = b, x € R” has 1) a unique solution; 
2) no solution; 3) infinitely many solutions, respectively. 


4. Let A be an m x n matrix. Show that if Az = b has two distinct solutions, 
then it has infinitely many solutions. 
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5. Show that if 21, 12 are both solutions of Ax = b, then i) zı — z2 is a solution 
of Ax = 0; ii) for every t ER, zi + t(xı — 22) is a solution of Ax = b. 


6. Let A and B be n x n real matrices and xy € R” a solution of Ax = b. 
Show that i) xy is also a solution of BAz = Bb; ii) a solution of BAx = Bb 
may not be a solution of Ax = b. 


3.4 Rank of matrices 


By Lemma 3.2.5 and Lemma 3.2.6, for a given m x n matrix, the dimension 
of the row space is the number of leading 1’s (or pivot 1’s) in the reduced row 
echelon form. The dimension of the column space is the number of leading 1's, 
too. Therefore, we have 


Lemma 3.4.1. Let A be an mx n matrix. The dimensions of the row 


space and column space of A are equal. 


We call the dimension of the row space of a matrix A the rank of A, which 
is also equal to the dimension of the column space. We call the dimension of 
the nullspace of A the nullity of A. 


Example 3.4.2. Let 
13.57 
A=|3 0 2 6 
0 12 5 


To find the rank of A, we use elementary row operations to reduce A into row 
echelon form: 


1357 13 5 7 
3 0 2 6| Z=, lo -9 -13 -15 
0125 0 1 2 5 
13 5 7 
A Jo 0 5 30 
012 5 
135 7 
A j0 1 2 5 
0 0 5 30 


Since the row echelon form of A has three pivot 1's, rank(A) = 3. O 
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Our next question is how to relate dim R(A), dim C(A) and dim N(A). 
Recall that if the leading 1's in the reduced row echelon form E of A corre- 
sponds to leading variables in the solution, the zero rows corresponds to free 
variables in the solution. Since the number of the leading variables plus the 
number of free variables is exactly the number of the columns of A, we have 
the following counting theorem, 


Theorem 3.4.3. Let A be an m x n matrix. Then we have 


dim R(A) + dim N(4) = n. 


Remark 3.4.4. Notice that dim R(4) < m. If m < n, that is, number of 
columns of A is larger than its number of rows, then dim N(4) =n—=m> l. 
That is, N(A) is nontrivial. 


Let A be an m x n matrix. We call 
N(AT) = (e e Rm: ATz = 0} = {x E€ R” : 27 A=} 


the left nullspace of A. Notice that dim R(A) = dim C(A) = dim R(A7) 
because they all are equal to the number of pivots in the reduced row echelon 
form. Then by Lemma 3.4.3, we have 


dim R(AT) + dim N(47) = m. 
That is, 


Lemma 3.4.5. Let A be an m x n matrix. Then we have 


dim C(A) + dim N(A7) = m. 


Rank one matrix 


For matrices with rank one, we can simplify its representation into a prod- 
uct of vectors. Indeed, if rank(A) = 1 where A is m x n, then every row of A 
is a scalar multiple of a pivot row, say u’, where by default the vector u is 


regarded as a column matrix. That is, there exist scalars ca, co, ++- ,C such 
that 
cur Ci 
coul C2 T 
A= > = u 
Cm ur Ci: 
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Let v = | . |. Then v + 0 and A = vu”, where u € R”, v € R”. More 


Cm 
importantly, we have 


If rank(A) = 1, then Az = 0 is equivalent to u7z = 0, where uf is a 


nonzero row of A. 


Rank of products 


Let A be m x n and B be n x r. We are interested how rank(A), rank(B) 
and rank(AB) are related. Indeed, by matrix multiplication: 


AB = Alby : ba : +--+: br] = [Abı : Aba : ---: Abp], 


which implies that each column of AB is a linear combination of the columns 
of A. Therefore we have rank(AB) < rank(A). Moreover, 


ay a,B 

a2 a2B 
AB = i B = pă 

Um AmB 


which implies that each row of AB is a linear combination of the rows of B. 
Therefore we have rank(AB) < rank(B). In summary, we have 


Theorem 3.4.6. 


rank(AB) < min{rank(A), rank(B)}. 


Exercise 3.4.7. 


1. Find the ranks and dimensions of the nullspaces of the following matrices . 


0 30 1307 1 
a [ESO 7, e 95, ġġ 3951, d f 
2 5 5 2553 2 


2. Let A be an m x n matrix with m > n. Show that AAT is not invertible. 


3. Let A be an m x n matrix. Find all possible vectors x such that x € 
R(ANN(A). 


4. Let A be an m x n matrix. If xy + 0 is a solution of Ax = 0, then Ax = 0 
has infinitely many nontrivial solutions. 
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5. Let A be an mx n matrix. Show that system Ax = b has a unique solution 
if and only if 
rank(A) = rank([A : b]) =n. 


6. Let A be an m x n matrix. Show that system Ax = b has infinitely many 
solutions if and only if 


rank(A) = rank([A : b]) < n. 


7. Let 
1 3 5 7 
A= |3 9 15 21 
9 27 45 63 


Show that rank(A) = 1 and find u € R3, v € R* such that A = vu?. Is this 
decomposition of A unique? 


8. Show that if rank(A) = 1, then every column of A is a scalar multiple of 
one specific column of A. 


9. Use block multiplication of matrices to show that 


rank(A + B) < rank(A) + rank(B). 


10. Let rank(A) = s. Find the ranks of the following matrices 


24, [A 4], ak a ab 


3.5 Bases and dimensions of general vector spaces 


Using the notion of span we defined basis and dimension of vector spaces 
which is spanned by a priori known linearly independent set of vectors. We 
say a vector space is finite dimensional if it can be spanned by a finite 
set of vectors. Otherwise, we say a vector space is infinite dimensional. 
However, so far there is no guarantee that every vector space has a basis. In 
this section, we explain this issue and show through examples how to find a 
basis for a given vector space. 


Example 3.5.1. Let Maz be the set of all 2 x 2 matrices, which is a vector 
space with matrix addition and scalar multiplication. For every A € Ma, we 
can represent A as 
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a 0 0 b 0 0 0 0 
le otb otk otl a 
1 0 0 1 0 0 0 0 
=a | o| + fo o ter o| +e lo ik 


Therefore we have 


me 


1 0| [0 1 0 0 0 O]/|.y. E 
Next we show that “E i F e o ; i o ; E if} is linearly indepen- 


dent and consider the vector equation: 


1 0 o 1 0 0 0 0 
C1 E o + Ca E a + ca ? o + C4 i | = 0, 
which has only the trivial solution: (c1, c2, ca, c4) = (0, 0, 0, 0). Then by 
definition of basis, we have verified that 


tlo db olle ob 1) 


is a basis of Moo. O 


Example 3.5.2. Let A be an n x n invertible matrix. Then the set of the 
columns of A is a basis of R”, and C(A) = R(A) = R”. Indeed, the reduced 
row echelon form U of A is J and hence the set of n-columns of A is linearly 
independent. If the set of n-columns of A is not a basis, then a basis would 
have more than n vectors, which is impossible by Theorem 3.1.10 as we know 


that R” has a standard basis {e1, €2, --- , en} where 
1 0 0 
0 1 0 
e = ej. | > En = 
0 0 1 
and the only nonzero entry of e; is 1 at the i-th coordinate. D 


Example 3.5.3. Find a basis of E = {x = (x1, x2, 13) ER? : 1,—2122+313 = 
0). E isa plane in R? passing through (0, 0, 0). For every x = (21, £2, 73) € E 
we have zı = 2x2 — 3x3 and 


LI 213 == 323 213 323 2 3 
v2) = x2 = | £2 | + 0 = £2 |1| +23 |0 
T3 T3 0 v3 0 1 
2 3 2 3 
That is, E = span 1|, [0| ]. Since cy |1| +c2 |0| = 0 implies cı = cg = 
0 1 0 1 
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2 3 
0, 1|, 10 is linearly independent and is a basis for Æ. We remark that 
0 1 


E can be rewritten as 


E={r= (£1, £2, £3) ER? : £1 — 2x2 + 323 =0) 
= {x = (21, £2, 23) E R? : x- (1, —2, 3) =0) 
= {x = (£1, 22, 13) ER : £ L (1, -2, 3)}, 
which means F is the set of all vectors which is orthogonal to the given vector 


(1, —2, 3) which is called the normal of the plane. Indeed, every vector of E 
is orthogonal to every vector from the subspace F = {k(1, —2,3):k ER}. o 


Example 3.5.4. Consider the set P, of all polynomials with degree less than 
or equal to n € R. For every f € P,, we have 


f(x) = ao + az 4 aga” bees Fana”, 


which is a linear combination of the set of polynomials S = {1, z,--- ,a”}. 
That is, Pa = span(S). We claim that S is a basis for Pa. To check linear 
independency in S, we consider the vector equation 


ao + az + azz? +--+ anz” =0, for all x ER. 


If (ao, a1, ::: am) + 0, then the vector equation has at most n solutions 
(instead of all x € R). Therefore, we have (ao, aa, ::: ,@,) = 0 and the 
set of polynomials S = (1, x,---,2”) is linearly independent. That is, 
S ={1,2,---,2"} is a basis of P,, and dim Pp =n+1. O 
Remark 3.5.5. From Example 3.5.4, we notice that if the basis S = 
{1, x,- +- ,a”} is fixed, every f € P, is uniquely determined by the vector con- 
sisting of the coefficients (ao, a1, --- ,@n) € R”+!. We call (ao, a1, +++ ,@n) € 


R"*t! the coordinate vector of f with respect to the basis S. 


Lemma 3.5.6. Let V be a finite dimensional vector space with a basis 


S = {v, va, +++, Un}. Then the coordinate vector |z]s e R” of every 
vector x € V is unique. 


Proof. Suppose not. Then there exists [x] € R” such that 


z = [v1, va, +++ ,Un]lz]s = [v1, va, +: „vnllz)s, 
which lead to 
[vi, va, +++ ,Un] ([x]s — [z]5) = 0. 
Since S = {v1, v2, ::* , Un} is a basis and is linearly independent, we have 


[z]s — [x] = 0 and hence [x]s = [2]'5. This is a contradiction. a 
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Change of basis 


Let V be a vector space with two bases B = {v1, v2, +++ , Un} and B’ = 
{wi, wa, +++, Wn}. Then every vi, i = 1, 2,--- ,n, is a linear combination of 
the vectors in B’. That is, there exists an n x n matrix P such that 


[v1, V2, Un] = [w1, wa, ++- , Wn P. 


Then P is invertible because system Px = 0 has only the trivial solution. 
Indeed, 


Pa = 0 > [w,, we, + , Wn] Px = 0 
> [v1, v2, +++ Un] =0 
> x=0. 


We call P the transition matrix from the basis B to the basis B’ and write 
P =Pg5Br. 

We are interested how the coordinate vectors are related when a basis is 
changed into another. Let x € V be a vector with coordinate vector [1]g € R” 
under the basis B, and [z]z, € R” under the basis B’. Then, on the one hand, 
we have 


x=[01, v2, ++- ,Un][2] 8 
= ([w1, W2, °°" ,Wn]P) [2] 5 
=[w1, W2,°°* ,Wn|(P[a]z). 


On the other hand, we have 


x= (wi, wa, >> , Wn [2] 8. 
Since BY = {w1, wo,-++,Wn} is a basis, [wi, wa, +- ,waJ(Pla]g) = 
[w1, wa, +++ ,Wp)[1] 8, implies that 
Plz|p = [2]5,. 


In summary we have 


Lemma 3.5.7. Let V be a vector space with two bases B = 
{v1, va, +++ ,Un and B’ = {w1, wa, --- , Wn}. Then there is an nx n 
invertible transition matrix Pg,» such that 


[v1, va, >> ,Un] =[w1, wa, :-: , Wn| Pap 


and for every x € V its coordinate vectors with respect to the bases B 
and B’ satisfy 
Pgp (218 = [1] s. 
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For many vector spaces, we have certain bases such as the standard bases 
with which the coordinate vectors are easy to compute. For example, R” has 
the standard basis {e1, e2, ++- ,e€n} with which every vector of R” coincides 
with its coordinate vector, and P, has a standard basis (1, x, 2?, --- ,2”) with 
which the coordinate vector of a polynomial is the vector in R”+! consisting 
of its coefficients. Moreover, for a general n-dimensional vector space V, each 
column of the transition matrix Pg_,5 from the basis B = (01, v2, +- , Un} 
to the (standard) basis S is its coordinate vector with respect to S: 


Pas = [[vi]s : [vals : +++: [0n]5]. 
Similarly, for a basis B’ = {w1, wa, ++- , Wn}, we have 
Pgs = [[w1]s : [wale : +--+: [wn]s]. 


Then for every x € V, we have 
PB—slz]e = [2]s = Pp'—slz]p. 


Therefore we have 
[z]a = P~'p43Pa+sla]z. 


Since x is arbitrary, we have 
-1 
P"BssPgss = PB". 
In summary we have 


Lemma 3.5.8. Let V be an n-dimensional vector space with bases S, 
B and B’. Then we have 


= 
Pee = P ps PBs. 


When the dimension n is large, P~! g’ s Pgs can be obtained by Gauss- 
Jordan elimination: 


Elementary row operations 


[I : Py 45Pps]. 


[Pgs : Ps] 
Example 3.5.9. Let B = {(1, 1), (1, 0)}, B’ = {(0, 1), (1, —1)} be two bases 
of R? (why are they bases?). 

1) Find the transition matrix Pp pr; 


2) If a vector x has coordinate [x]g = (—1, 1), find [2] g,. 
Solution: 1) To find Pg_, g», we make use of the standard basis S of R?. Note 
that 


1 1 0 1 
Pass = í o| Paras = E Al f 
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Then we have 


= 0 1) j0 1 1 -—1 
rer Pete AA 


o 


Remark 3.5.10. (optional) For every vector space V, {0} C V is a subspace. 
By convention, we accept that the empty set is a basis for {0} and is linearly 
independent. Hence we say dim{0} = 0. 

Now we consider a nontrivial vector space V, which has at least one nonzero 
vector, say vı # 0; then Sı = {v1} is linearly independent. If V = span(S1), 
then we find a basis for V. Otherwise, there exists va ¢ span(S1). We add 
va into Sı to obtain a linearly independent set of vectors Sp = Sı U {v2}. 
If V = span( S2), then we find a basis for V. Otherwise, we continue to add 
vectors from V | span(S2) to obtain a new linearly independent set of vectors 
S3. If the process stops at a finite step n with V | span(S,,) = 0, we obtain a 
basis with finitely many vectors for the vector space V. Otherwise, the space 
is infinite dimensional. Certainly this is not an efficient way of finding bases 
for infinite dimensional spaces and it is often not trivial to find a basis for a 
specific infinite dimensional space. 

Define 


Y = The collection of all linearly independent subsets of V, 
then Y is partially ordered by set containment C, where a partial order is a 
relation on Y which is 
1) reflective: for every x € Y with z C a; 
2) transitive: for every x, y, z € Y with xz C y, y C z we have z C 2; 


3) anti-symmetric: for every x, y € Y with x C y, y C x we have x = y. 
If V is an infinite dimensional vector space, then the process of constructing 
a basis gives a sequence of linearly independent sets: 


S1 C S2C C Shn ¢ 


Then X = {Sn : n € N} is a totally ordered subcollection of Y. (Every pair in 
X can be ordered by set containment C.) 
Zorn's lemma in set theory claims that if every totally ordered subset X of 
a partially ordered set Y has a upper bound, then Y has a maximal element. 
If we take union to obtain the upper bound for every totally ordered sub- 
collection such as X, Zorn's lemma applies to our current situation and asserts 
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the existence of a maximal element in Y, which is the maximal linearly inde- 
pendent set of vectors, namely, a basis of V. We arrive at 


Theorem 3.5.11. Every vector space has a basis. 


The notion of basis we discussed so far is called algebraic basis or Hamel 
basis. Other types of basis of a vector space may be defined when the vector 
space has extra structures. 


Exercise 3.5.12. 


1. For the following matrices, determine whether or not i) the rows are linearly 
dependent; ii) the columns are linearly dependent. 


1 2 3 
a) E : AE bd 12 3 4], oc) 
345 


VW N RE 
ew do 


2. Let V be a vector space. Show that if {v1, va, v3, v4} C V is linearly 
independent, then {v1 — v2, ve — v3, V3 — V4, V4 — U1} is linearly independent. 


3. Let Pa be vector space of all polynomials with degree less than or equal to 
2. Let W = (pe P» : p(1) = 0}. Show that W is a subspace of P> and find a 
basis of W. 


4. Let P be vector space of all polynomials with degree less than or equal to 
3. Let W = (pe P; : p(1) = p(2) = 0}. Show that W is a subspace of Pz and 
find a basis of W. 


5. Let S = [v1, va, +> , Un} be a set of nonzero vectors in a vector space V 
with the orthogonal property that 


vi vj =0, ifizj. 
Show that S is linearly independent in V. 


6. Let S be the plane x — 2y +32 = 0 in R®. i) Find the normal of S; ii) Show 
that S is a subspace of R?; iii) Show that R? = S+span(n) and SMspan(n) = 
10), where n is the normal of S. See Exercise 3.1.13 for the definition of set 
addition. 


7. Let S be the plane z — 2y = 0 in R. i) Find the normal of S; ii) Show that 
S is a subspace of R3; iii) Show that R? = S+span(n) and SNspan(n) = {0}, 
where n is the normal of S. 


8. Let 


B={(1, 1, 1), (1, 1, 0), (1, 0, 0)} 


Vector spaces 69 
and 


B' ={(0, 1, -1), (1, —1, 0), (-1, 0, 0)} 


be two bases of R? (why are they bases?) i) Find the transition matrix Pgp’; 
ii) If a vector x has coordinate [x]g = (—1, 1, 0), find [2]. 


9. Let 
B={(1, 0, 1), (0, 1, 1), (1,1, 0)} 


and 


B' ={(1, 0, -1), (1, —1, 0), (-1, 0, 2)} 


be two bases of R3. i) Find the transition matrix Pgp; ii) If a vector x has 
coordinate [z]g = (—1, 1, 1), find |z] g». 


10. Let 


Suppose P is the transition matrix from the basis B to the standard basis 
S = {e1, e2, e3} of RÌ, and Q is the transition matrices from the basis B’ to 
the standard basis S. i) Find the transition matrix from B to B’; ii) Find the 
bases B and B’. 


11. Let f(x) = 2x? + 273 + 2x} + 40112 + 432103 + 41301. Find a change of 
variables x = Py, with x = (11, £2, 23) and y = (y1, Y2, y3) where P is an 
invertible 3 x 3 matrix such that 


f (Py) = Ai + Aya + Asy3 
for some A, A2, Az E R. 


12. Let A be an m xn matrix with m < n. Show that the columns of A are 
linearly dependent. 


13. Let S be a linearly independent set in a vector space V. If x € V but 
x ¢ span( S), then SU {a} is linearly independent. 


14. Let S be a set of vectors in a vector space V. If x € S and x € span(S \ 
{x}), then span(S) = span(S (x). 


15. Show that the vector space F of all continuous functions f : R > R is 
infinite dimensional. 


16. Show that every subspace of R” is the nullspace of a matrix. 
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17. Let {v1, va, -+> , Un} be a basis of R”. Show that for every r < nm, r EN, 
if s + t with s, t € R, then 


span{v1, V2, +++ Up—1, Ur + SUn } # spantui, V2, +++ Up—1, Ur + tun}. 


18. Show that for every r < n, r € N, R” has infinitely many r-dimensional 
subspaces. 
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4.1 Orthogonality of the four subspaces 


Example 4.1.1. (Example 3.5.3 revisited.) Let E = {a = (21, £2, 73) € R : 
11 —2z2 +3£3 = 0) be a subset of R?. Then E is a plane in R? passing through 
(0, 0, 0). For every x = (11, £2, 3) € E we have 11 = 2x2 — 3x3 and 


Tı 222 = 323 213 323 2 3 

Lo | = T2 = T2 + 0 = T2 1| + £3 0 

T3 T3 0 23 0 1 
2 3 

That is, E = span 1|, |0| |. Moreover, E can be rewritten as 
1 


E={r= (21, T2, 23) ER? :al (1, —2, 3)}, 


which means E is the set of all vectors which is orthogonal to the given 
vector (1, —2, 3) and hence orthogonal to every vector from the subspace F = 
{k(1, —2, 3) : k € R}. Namely the subspaces E and F of R? are orthogonal. 
We write E L F. o 
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Definition 4.1.2. Let V and W be subspaces of R”. If for every 
(x, y) € V x W we have 


z-y=0, 
V and W are said to be orthogonal, denoted V L W. 


Remark 4.1.3. For a general real vector space V, orthogonality can be es- 
tablished if we can define a product (-, -) between elements which satisfies the 
following properties, 


for every u, v, w € V and for every scalar k € R, 


We call (-, -) an inner product and call V an inner product space. 
x, y € V are said to be orthogonal if (x, y) = 0. 


For example, for p, q € Pn, with p(x) = ao + az + azz? +--+ anz” and 
q(x) = bo + biz + box? + +--+ baz", we can define inner product between p 
and q by 


(p, q) = aobo al abı A pi Anbn. 


Then P, becomes an inner product space. p, q € Py, is said to be orthogonal 
if 
(p, q) = aobo + ab A y anbn =0. 


In what follows in this chapter, we assume that a vector space is equipped 
with an inner product. o 


Let A be an m x n matrix. Consider N(A) = {x € R” : Ax = 0). For every 

x € N(A) we have 
(Row 1 of A)-x 
(Row 2 of A)-x A 


(Row mot A). Z| 
That is, x is orthogonal to every row of A. Therefore, we have 
N(A) L R(A). 
Recall that the counting theorem says 


dim R(A) + dim N(A) = n. 
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Notice that both R(A) and N(4) are subspaces of R”. A natural question is 
that for every x € R” can we split x into two pieces x, € R(A) and £n € N(A), 
such that £ = £r +2, ? That is, for every x € R”, if x 1 N(A), then z € R(A). 

Let us take a basis S, = {v1, v2,---,v,} for R(A), and Sn = 
{wi, wa, ++: ,Un—r} for N(A), where r = rank(A). If SU Sn is a basis for R”, 
then the answer is affirmative. Consider the vector equation 


C1U1 + Cgvg Fe: + CrUr + diwi + dowa +++ dn-rWn-r = 0. (4.1) 


We show that S, U Sn is linearly independent. Otherwise, there exist 
(c1, Ca, ++- ,Cr) #0 and (da, da, +-+ ,dn-r) # 0 such that (4.1) holds. There- 
fore, there exist x € R(A) and y € N(A) such that 


z+y=0. 


That is, z, y € R(A)N N(A). Note that we have R(A) N N(A) = {0}, since 
the only vector orthogonal to itself is the zero vector. Therefore, x = y = 0 
and 

C1U1 + CagUa2 +++ + CpUr =0 


with (c1, co, +++ ,Cr) # 0. This is a contradiction since S, = {v1, U2, +++ , Ur} 
is a basis for R(A). S, U Sn is linearly independent and hence a basis for R”. 

Returning to our original question, for every x € R”, it can be represented 
by the basis S, U Sn as 


£ = C11 + cove +--+ Cpup + diwi + daWa +--+ dn-rWn-r- (4.2) 


Put £r = 0,01 +cQv2o +- -+ CrUr, En = dw1 +d2W3+:* : dn—-rWn-r. We obtain 
the split: 
D= Lr + In 


where x, € R(A) and x, € N(A). The next question is, is the split unique? 
Suppose not. We have z = 1, +2, = x +x, Where x’. € R(A) and x, € N(A). 
Then we have 

Lp — xl. = Lp, — In E R(A)NN(A), 


and Ly — xl. = x! — £n = 0. The split is unique. In summary we have 


Theorem 4.1.4. Let A be an m x n matrix. Then for every x € R”, 
there exists a unique orthogonal decomposition 


£ = Tr + Tn, 
where x, € R(A) and zn € N(A). That is, 


R”=R(A)ON(A), and N(A)L R(A), 


where R(A) O N(A) is the direct sum of the subspaces R(A) and 
N(A) of R” whose only common element is the zero vector. 
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Definition 4.1.5. Let W be a subspace of the vector space V. Then 
W- defined by 


W*=(x€V:x_l y, for every y € W} 


is also a subspace of W and is called the orthogonal complement 
of W. 


From Theorem 4.1.4, we know that 


7 J . We split x = (4, 3) into two parts with 
T = Ly + zn Where x, € R(A) and zn € N(A). 
Solving Ax = 0 we obtain N(A) = span (i ) Let £n = t(—2, 1) with 


tz 0. Note that 


Example 4.1.6. Let A = 


L— Ln L Ln. 


We have 
(a — t(-2, 1)) -t(-2, 1) =0, 


which leads to t = —1. We have 


Tn = (2, sl), Tr = T — In = (2, 4). 


Corollary 4.1.7. Let A be an m x n matrix. Then for every x € R”, 
there exists a unique orthogonal decomposition 


L= Lr + Tn, 


where x, € C(A) and £n € N(AT). That is, 


R” = C(A) N(AT), and N(AT) L C(A). 


An important interpretation of Corollary 4.1.7 is the so-called Fredholm 
alternative: 
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Corollary 4.1.8. If b € R” is not in the column space of the m x n 
matrix A, then it is not orthogonal to N(A7). Using matrix language, 
we have 
either system 

Az =b 


or system 


ATy = 0, with y7b=1 


has a solution. 


Example 4.1.9. Consider 
x+2y+3z=0 


4x + 5y + 6z=1 (4.3) 
Tx+8y+9:=1. 


1 2 3 0 
Let A= |4 5 6] and b= |1|. We have the augmented matrix, 
7 8 9 1 
0 
1 
1 


123 i 2 3 0 
A:dj=|4 5 6 1| =B5 lo -3 -6 1 
789 Se l0 -6 -12 1 

12 3 0 

Re 2R, jo -3 -6 1 


0 0 0 —1 


which show that rank(A) + rank([A : b]) and hence Ax = b has no solution. 
By Corollary 4.1.8, system 


ATy = 0, with y/b= 1 


has a solution. Indeed, we have 


147 4 7 
Pela 8| Z= -3 —6 
3 6 9| e —6 -12 


oor oor ocr 
l 
w 
l 
aD 
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1 
which has a one-dimensional nullspace N(AT) = span | |—2| |. Let y = 
1 
t |—2]|, t€ R and solve 
1 
0 
yb=1sSt[1 -2 1] [1] =1. 
1 
1 
We have t = 1. That is, ATy = 0 with y7b = 1 has a solution y = | —2 
1 
Exercise 4.1.10. 
L 2-3 
1. Let A= |4 5 6|.i) Find N(A) and split x = (1, 1, 1) into £ = £r + £n 
7 8 9 


with £n € N(A) and x, € R(A); ii) Find the orthogonal complement of N (A). 


de >: 
2. Let A= |2 3 4]. i) Find N(4) and split x= (1, 1, 1) into £ = £r + £n 
3 4 5 
a 


with £n € N(A) and x, € R(A); ii) Find the orthogonal complement of N (A). 
3. Let S be the plane z — z = 0 in R®. Find the orthogonal complement of $. 


4. Let S be the plane x + y+ z = 0 in R3. Find the orthogonal complement 
of S. 


5. Show that every triangle inscribed in a semicircle is a right triangle. 
6. Show that the function (, ) : R? x R? > R defined by 
(x, y) =211Y1 — T2Y1 — Za + 2zoyp 
is an inner product on R?, where x = (11, 22) € R? and y = (y1, 12) ER?. 


7. Define inner product on the vector space of polynomials P,, by 


1 
(fa = | FODA 

i) Show that W = {p € Py: p(1) = 0} is a subspace of Pa. 

ii) Find the orthogonal complement of W in P4. 

8. Let A and B be subspaces of a real inner product space. Show that 


At N BŁ = (AU B)+Ł, (AN B)+ = A+ U BŁ. 
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9. Let 8 € R” be a nonzero vector. Show that i) V = (x: 2-68 =0}isa 
subspace of R”; ii) dim V =n-— 1. 


10. Let C([a, b];R) denote the set of all real-valued continuous functions de- 
fined on [a, b]. i) Show that 


b 
7.9) = ii KOTON 


defines an inner product on C([a, b]; R). 

ii) Show that cos mz and sin ng are orthogonal in C([0, 27]; R) if m z n. 

iii) Show that cos mz and cosng are orthogonal in C([0, 2r]; R) if m # n. 

iv) Show that sin mz and sin ng are orthogonal in C([0, 27]; R) if m +n. 

v) Let W = {1, cosnz, sinnx}_,. Suppose f € C([0, 27];R) is a linear 
combination of the vectors in W. Find the linear combination coefficients. 


11. Let W C Mnn be the set of all skew-symmetric matrices, V C Mnn be 
the set of all symmetric matrices. Show that Mnn = V O W. 


12. Let F denote the vector space of all functions f : R —> R. Let W C F be 
the set of even functions, V C F be the set of all even functions. Show that 
F=V W. 


13. Let A € Man be such that A? = A and AT = A. Show that R” = 
C(4) C(I — A) and C(A) L C(I — A), where C(A) denotes the column 
space of A. 

14. Let A € Mnn be such that A? = —A and AT = A. Show that R” = 


C(A) C(I + A) and C(A) L C(I + A), where C(A) denotes the column 
space of A. 

15. Let A be an n x n matrix with rank n. Suppose A is partitioned into two 
z „Let W, = {x € R” : Ax = 0} and Wa = {x € R” : Agr = 0}. 
Show that R” = W; O Wo. 


parts A = | 


16. Let A be an orthogonal matrix. Define a map Ta : R” > R” by T4 (x) = 
Az. i) Show that for every x, y € R”, we have (Ax) -(Ay) = x-y; ii) Show that 
if V CR” is an invariant subspace with respect to Ta, that is, TA(V) CV, 
then V+ is also invariant with respect to T4. 


17. Let P,, be the set of all polynomials with real coefficients. Define the 
inner product on Pa, by 


(f, 9) = aobo + a1b1 +-+: + arbr, 


where f(x) = ao + a£ +--+ + amr”, glx) = bo + bix +--+ + bnz” and 
r = maxim, n}. Let V = {f € Pa : f(0) = 0}. i) Show that V is a subspace 
of Px; ii) Find the orthogonal complement of V+ in P; iii) Define a map 
T : Po > Poo by T(f)(x) = xf (x). Show that T(V) CV but T(V+) € V+. 
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18. Verify the Fredholm alternative, for system Ax = b with 


1 3 5 0 
A=|7 9 11] andb= |-2 
13 15 17 1 
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4.2 Projections 


In the last example of the previous section, we projected an arbitrary 
vector x € R” onto a line in order to obtain a split x = x, + £n such that 
tr L £n. With such a split we have || 2]? = ||x,||? + ll2,.11? and £n is the best 
approximation for x from vectors in the line where z, lies in. In this section 
we deepen the understanding of the mechanisms of projections onto general 
subspaces of R” and their matrix representations. In the next section, we 
discuss how to find the best approximation for a given vector using projections. 
In what follows, by projection of a vector x in a (inner product) vector space 
V onto a subspace W we mean a map P: V => W C V such that 


x— Px LW. 


FIGURE 4.1: A projection P from V to the subspace W satisfies b— Pb L W. 


Example 4.2.1. For b = (b1, bo, b3) € R3, the projections of x onto subspaces 
such as the z-axis, y-axis, z-axis, the zy-plane, xz-plane and yz-plane can be 
obtained by picking out the nonzero coordinates in the subspaces leaving the 
others zero. Note that to show a vector is orthogonal to a subspace, we need 
only to show it is orthogonal to every vector of a basis of the subspace. 


1) The projection P from R? to the x-axis is given by 
Pb = (b1, 0, 0), 


with matrix representation 


oor 
ooo 
ooo 
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Indeed we have b— Pb = (0, ba, b3) and the z-axis is spanned by (1, 0, 0). 
Moreover, b— Pb L (1, 0, 0) and b — Pb is orthogonal to the z-axis. 


2) The projection P from R? to the zy-plane is given by 
Pb = (0, be, bs), 


with matrix representation 


1 0 0 
010 
0 0 0 


Indeed we have b — Pb = (0, 0, b3) and the xy-plane is spanned by 
(1, 0, 0) and (0, 1, 0). Moreover b— Pb L (1, 0, 0) and b— Pb L (0, 1, 0). 
Hence b — Pb is orthogonal to the xy-plane. 


In both cases the projection matrix P satisfies P? = P. 


Projection onto a line 


Let b e R” and l be the line spanned by the nonzero vector a € R”. 
Then the projection Pb in | is a scalar multiple of a. Suppose Pb = ĉa. With 
b— Pb La we have a: (b — ĉa) = 0 and 


~ a-b alb 
aS a 


a-a ala 
. A T . . . 
That is, Pb = ĉa = aba, To obtain the representation matrix, we have 


alb aTb aab aa! 


b. 


a aTa aTa 


1 4 
Example 4.2.3. Project b= | 0 | onto the line l through a = |5| to find 
—1 6 


Pb such that b— Pb 1 1. 
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Solution: The projection matrix is 


ae 1 4 1 16 20 24 
P= r= z [5] [4 5 6l= [20 2 30 
a 6 24 30 36 
16 20 24 1 —8 
Then Pb = + |20 25 30| | 0 | = 4 |-10|. By Lemma 4.2.2, we have 
24 30 36| |—1 —12 
b- POLI. O 
Note the projection matrix P = a satisfies 
p aa! aa! aa” _p 
atada afa ` 
Projection onto a subspace 
Let b € R” and W be a subspace with a basis S = {a1, a2, ::: , am}. Then 
the projection Pb in W satisfies that 
1) Pb is a linear combination of the basis S = (aa, a2, ::: ,@m}. That is, 
there exists ĉ = (41, La, +- , îm) ER” such that 
Pb = $101 + ĉ2a2 +:** + mam = [a1 : a2 : ++ ami. 


2) b — Pb is orthogonal to W if and only if b — Pb is orthogonal to every 


vector in the basis S = {a1, a2, ++: ,@m}. That is, 
ai: (b — Pb) an 
az : (b — Pb) al 
; =|? (b— Pb) = 0. 
am + (b — Pb) az, 
Let A = [a1 : az : +++ : am]. We have Pb = Aĉ and AT (b — Aĉ) = ATb — 


AT Aĉ = 0. Notice that by Theorem 2.5.6, AT A is invertible since its columns 
are a basis and hence linearly independent. Therefore, we have 


ĉ = (ATA) LATh, 


and 
Pb = Aĉ = A(AT A)! ATO. 


In summary, we have 
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Lemma 4.2.4. Let W be a subspace of R” with a basis S = 
{a1, 42, ::: ,@m} and A = [as : a2 : -++ : am]. Then for every b € R, 
the projection of b onto the subspace W is 


(p= AG AN T 


and the representation matrix of the projection is 


P= A(AZ AJ AZ. 


Example 4.2.5. Let 


b= 


OPOR 
Rita 


1 
0 
0 ? 
1 


OO BR 


Find the orthogonal projection Pb of b onto the column space of A. 
Solution: We notice that the columns of A are linearly independent because 
rank(A) = 3 and A has exactly three columns. Then Lemma 4.2.4 applies. We 
solve the system AT Aĉ = ATb where 


1 100 
ATA=|1 0 1 0 
1 0 0 1 


4 4 4 1 
Ri+R2+R3 1410 
1 14 0 


=> O 


i Orere 


al! loo al! al! Alw AIR HH Obe 
=> => => 


Ajo al al! Ajo al, aja 
=> => => 
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Then the inverse of ATA is 
3 
7 
Tuy=l =1 
(£A) = |F 
=L 
7 


Therefore, the orthogonal projection Pb of b onto the column spaces of A is 


11 1] 3 a 1 3 

100 4 4 4 2 1 

Pb = Aĉ = A(ATA) LATb = E æ ł | jolie 
is 21 3 2 2 

00 1] LF +7 F 1 

2 


Remark 4.2.6. We remark that the projection matrix P = A(A? A)~1A™ 
satisfies 


P? = A(AT A) LAT A(AT A)“ AT = A(ATAJLAT =P, 
PT = (A(AT A)-1A7)7 = A(AT A)? =P. 


That is, 


P? = P and P is symmetric. 


Then we have 
P(b — Pb) = Pb — P*b= Pb — Pb=0, 


which imply that b— Pb e N(P). The symmetry of P implies that the row 
space of P is the same as the column space of P. Then for every b € R”, b has 
an orthogonal decomposition 


b = bn + by, 
with bn = b — Pb € N(P) and b, = Pb € C(P) = R(P). 


Exercise 4.2.7. 


1 0 
1. Let W = span | |—1 1| |. Find the projection matrix P for the pro- 
2 


bi 


jection onto W. 


2. Find the orthogonal projection of the vector b = (1, 2, 4) € R? onto the 
line 
„EU 
2° sp Ae 
and find the distance from the point (1, 2, 4) to the line ]. 
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3. Find the orthogonal projection of the vector b = (1, 2, 4) € R3 onto the 
plane S : x — y +z = 0 and find the distance from the point (1, 2, 4) to the 
plane S. 


4. Find the orthogonal projection of the vector b = (1, 0, 4) € R3 onto the 
plane S : x+y = 0 and find the distance from the point (1, 0, 4) to the plane 
S. 


5. Find the distance from 
where B1 = (1, 0, 0, 1), Ba 


= (=1, 
(1, 1, 0, 


Q 


1, 1, 0) to W = span(B1, Ba, B3) in R4, 
2) and $3 = (1, 1, 4, 1). 


6. Find the distance from a = (—1, 1,1, 0) to W = span(61, Ba, 83) in Rf, 
where bı = (1, 0, 0, 1), Bo = (1, 1, 0, 2) and b3 = (0, 1, 0, 1). 


7. Let u and v be vectors in R3. Show that 

(u +v): (u = v) = lull? — ol. 
8. Show that the diagonals of a parallelogram in R? with equal sides are 
perpendicular to each other. 


9. Let A(0, 1, 1), B(1, 1, 1) and C(—1, —1, 0) be the vertices of the triangle 
AABC. Find the area of AABC. 


10. Let u and v be nonzero vectors in R3. Show that the area of the triangle 
determined by u and v is 


1 2 2 2 
z V uliul? — (u o. 


11. (Parallelogram law) Let u and v be vectors in R”. Show that 
2(lull? + llvl]?) = [lu + vl? + llu — v. 


12. Let A be an m x n matrix with linearly independent rows. Find the 
representation matrix P of the orthogonal projection onto the row space of A. 


13. Let s and be lines in R3 given by 


Find the distance d between s and |: 


d= min lp—qll. 
pes, qel 


14. Let W be a subspace of R”. Show that the orthogonal projection P : V > 
W is a linear map. 
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15. What is the representation matrix P for the projection from R” onto 
itself? 


16. Let P be the representation matrix for the projection onto a (true) sub- 


space W of R”. i) For every x € W, is it possible Px = x? ii) Is P invertible? 
Explain your answer. 


17. Let P be the representation matrix for the projection onto a (true) sub- 
space W of R”. Is P dependent on your choice of the basis for W? Explain 
your answer. 


18. Let P be an n x n matrix and satisfy PT = P and P? = P. Is it true that 
P is the representation matrix for the projection onto a (true) subspace W of 
R”? Namely, for every b € R”, is b— Pb orthogonal to Pb? 


19. Let P be the representation matrix for the projection onto a subspace W 
of R”. Is it true that W equals the column space of P? 


20. Let C([0, 27];R) denote the set of all real-valued continuous functions de- 
fined on [0, 27]. Let W = span{1, cosnz, sin nr). Define an inner product 
on C([0, 27];R) by 


27 


(f, 9) = A f(t)g@)dt, f, g € CUO, 271]; R). 


i) For every f € C([0, 27]; R), find the orthogonal projection fp of f onto W; 
ii) For every f € C([0, 27];R), find the distance d of f to W defined by 


1 
d= (f — fp, Í- fo)’, 
where fp is the orthogonal projection of f onto W. 


iii) Let fo(x) = 1 + x? be a function in C([0, 27]; R). Find the orthogonal 
projection fpo of fo onto W and (fo — fpo; fo — fpo) 


N 
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b 
b — Pb 


FIGURE 4.2: ||b — Pb|| < ||b — wl] for every w € W. 


4.3 Least squares approximations 


We have learned that a linear system Ax = b does not have a solution if 
bg C(A). Let W = C(A). A natural question is how to find a b, € W which 
best approximates b? Geometrically, the answer is the projection b, = Pb of 
b onto W, where P is the representation matrix of the projection onto W. 
Indeed, for every w € W, 


b=w=b-— Pb+Pb—w, 


where (b — Pb) L (Pb — w) since Pb— w € W and (b— Pb) L W. By the 
Pythagorean theorem, we have 


lb — wll? = [1b— Poll? + Pb — ol”, 
which implies that 
||b — Pd|| < ||b — wl, for every w € W, 
and the equality happens only if w = Pb. 


Theorem 4.3.1. Let W be a subspace of R” and P the representation 
matrix of the projection onto W. Then for every b € R, 


||b — Pb|| < ||b — w]], for every w € W, 


and the equality happens only if w = Pb. 
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Definition 4.3.2. Let A be an n x m matrix and b € R”. We call 


TER” a least squares solution to system Ax = b if Aĉ = Pb, 
where Pb is the projection of b onto the column space of A. 


Certainly Aĉ = Pb is always consistent with at least one solution since 
Pb is by definition in the column space of A. The question is how to find ĉ 
without having to find the representation matrix P? 

Let us first find a necessary condition that £ must satisfy. Notice that by 
definition of projection we have b — Pb | C(A) = R(A”). That is, b— Pb is 
orthogonal to each row of AT. Therefore we have 


AT (b — Pb) = AT (b — Aĉ) = 0, 
which is equivalent to 
ATAR = A™b. (4.4) 


That is, if £ is a least squares solution, it must satisfy (4.4) which we call the 
normal system corresponding to Ax = b. This also shows that the normal 
system AT Aĉ = AT is always consistent. 

A question remaining is that if ĉ satisfies the normal system (4.4), is it a 
solution to Aĉ = Pb and hence a least squares solution? Indeed, if ĉ satisfies 
the normal system (4.4), b — Aĉ is orthogonal to C(A) = R(A™). Then we 
have an orthogonal decomposition of b: 


b=b— Aĉ + Aĉ. 


Another orthogonal decomposition of b is 


b=b-— Pb+ Pb. 


By uniqueness of orthogonal decomposition with respect to the subspace W = 
C(A), we have 
Aĉ = Pb, 


which means that every solution of the normal system is a least squares solu- 
tion. 


Theorem 4.3.3. Let A be an n x m matrix and b € R”. ĉ € R” isa 
least squares solution to system Ax = b if and only if AT Aĉ = ATD. 
Moreover, A! 44 = ATb is always consistent. 


Corollary 4.3.4. Let A be ann xm matrix and b € R”. If the columns 
of A are linearly independent, then the least squares solution to system 
Ax = bis 


ĉ = (AT AJ ATB. 
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Example 4.3.5. (Example 4.2.5 revisited.) Let 


b= 


1 
0 
a 1 
0 


Rth 


1 1 
1 0 
0 0}? 
0 1 


Find a least squares solution to Ax = b. 
Solution: We solve the system AT Aĉ = ATb where 


11050 i 2 1 1 2 
ATA=|1 0 1 0 o| =|} 2 1/,4%b= |2 
1 0 0 1 1 1 1 2 2 
3 1 a1 
ester Get 
Aa e i T 
=i 1 3 
7 7 3 
Therefore, the least squares solution ĉ of Ax = b is 
273 
4 4 
¿=(APA ADS |= 2 =| 
oa 3 
a 3 
Example 4.3.6. Let (1, yi), (12, Ya), +> , (Um; Ym) be a set of data for the 
pair (x, y) € R?. Suppose that x and y are related by a polynomial 


y = p(x) = ao + aa + age? +--+ ane”. 


We wish to use the observed data to solve for the unknown coefficients 


ao, 41, *** , A, and set up the following equations 
2 
yı =p(21) Yr = do + 0111 + A273 ``: + Opt] 
2 
Ya = p(x2) Y2 = a0 + 012 + 0922 + An TS 
e | , 
Yn =Pltn), Ym = 00 + 41 Tm + azr, +++ + anm 


which can be written as Au = b, where 


1 2 me a ao Yi 
1 za ae © an ay Ya 
A= . RE 5 = 
1 2 n 
Tm Tm Tm an Yn 
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Such an inhomogeneous system may not have a solution for u, especially 
when the system is over-determined with m > n. However, we can find a 
least squares solution 4 of Au = b which minimizes ||b — Au]. Namely, a least 
squares solution & of Au = b is a solution to the minimization problem 


min ||b — Au], 
ER” 


and 
|b — Aal| = min ||b — Au]. 
UER” 


By Theorem 4.3.3, every solution of AT Aâ = ATD is a least squares solution 
to Au = b. We remark that the process of using a set of data to fit in a 
polynomial is called polynomial interpolation. 


Exercise 4.3.7. 
1. Is it true a least squares solution may not be unique? Explain your answer. 


2. Let A be an mx n matrix and bis n x 1. Is it true a least squares solution 
to Ax = b always exists? Explain your answer. 


3. Let xo be a least squares solution of Ax = b. Show that bT Axo > 0. 


4. Let 
la xg al 
q la z z3 
Î dmo E, qe 
Suppose that there are at least n+ 1 distinct values among (21, 12, -+** , Em). 


Show that the columns of A are linearly independent. (Hint: check the maximal 
number of distinct roots of a nontrivial polynomial p(x) = ay + a1% + agx? + 
++ anz”. A is called a Vandermonde matrix.) 


5. Use the polynomial h = bt+c to fit in the data in the sense of least squares: 


h}3 2 1 2 0 

tI L 2 34” 
6. Use the polynomial h = at? + bt + c to fit in the data in the sense of least 
squares: 


h}3 2 1 2 0 

tjO 1 2 3 4° 

7. Let Ax = b be consistent. i) Is it true that every least squares solution of 
Aa = bis also a solution of Ax = b? Justify your answer. ii) Is it true that 
every solution of Ax = b is also a least squares solution of Ax = b? Justify 
your answer. 
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8. Let n = (1, —1, 1, 0) be the normal of the hyperplane (see Exercise 3.1.13 
for the definition) S in R* passing through the point (1, 2, 3, 4). i) Find the 
equation for S; ii) Find the distance from b = (1, 2, 4, 5) to S. 


9. Show that if uo is a solution of 


min ||b — Aull, 
UER” 


then AT Aug = ATb. That is, uo is a least squares solution of Ax = b. 


10. Show that if uy is a least squares solution of Ax = b, then uy is a solution 
of 

min ||b — Aull. 

uUER” 
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4.4 Orthonormal bases and Gram-Schmidt 
Orthonormal bases 


We know that if W is a subspace of R” with basis S = {a1, a2, :::, dm}, 
then the projection onto W has a representation matrix 


P=A(ATA)-14?, 


where A = [a] : a2 : ++: : a]. The real application of the matrix P will 
involve the computation of the inverse of the m x m matrix ATA, which is 
usually not convenient. It would be a big relief if ATA = I, namely, if A is 
an orthogonal matrix (see Definition 2.5.11). Indeed, if A is an orthogonal 
matrix, we have the following observations: 


1) ATA = I implies that the columns of A are unit vectors and are orthogonal 
to each other: 


1 ifi=j 
O A (4.5) 
0 ifizj. 
We call S = [a1, a2,:::, am) an orthonormal basis if (4.5) is satis- 


fied. 


2) The representation matrix of the projection onto W is simplified to P = 
AAT. 


3) For every b € R”, its orthogonal projection onto W is 


Pb = AATb = lai : a2: am] | . | b= asa! b+agal b+- . «+4 aL b. 


T 

Note that ajafb = "+b is exactly the projection of b onto the one- 
dimensional subspace of W spanned by a;, i = 1, 2, --- , m. This implies 
that if W has an orthonormal basis, namely, if a basis consists of unit and 
orthogonal vectors, the projection onto W is the sum of the projections 
along the individual vectors in the orthonormal basis. 


Orthogonal basis 


Suppose that the subspace W has an orthogonal basis S$ = 
{a , a2, ::: , am}, which are not necessarily unit vectors. Namely, ATA is 
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a diagonal matrix with 


0 fiz; 
alaj = is i h A (4.6) 
0 114) 
We call S = (au, a2,:::, Gm} an orthogonal basis if (4.6) is satisfied. In 


this case, the orthogonal projection onto W has an easy representation. In 
fact, 


al az 0 vee 0 
0 al ay tee 0 
ATA= 
0 0 | alam 


For every b € R”, its orthogonal projection onto W is 


— 0 
aja al 
1 1 
0 = 0 ale 
az a2 a5 
Pb = A(AT A) LATD = la] : a2 Am) . |b 
T 
d a 
0 0 an m 
0141 azaz amaz, 
=- d ee 
ay ai 45 02 alam 
T T T 
ab as b a,b 
= 01 + ag t+ Om 
ay 01 2 42 Anam 
Similar to the case of orthonormal basis, Pb = ant reb + re azaz ie a + Spams 
implies that the projection onto W is the sum ab the e projections s alon the 


-+ 


individual vectors in the orthogonal basis. Pb = 


T 
T a? —dm indicates that the coefficient of the ene for Pb along the 


am 


Tp 
a; direction is a . In summary, we have 
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Lemma 4.4.1. Let W be a subspace of R” with an orthogonal basis 
S = Ta, a2, `+: , am}. For every b € R”, its orthogonal projection 


If S is orthonormal, then 


Pb = (aa! Fasas +. + Bye) b 
=(alb)a, + (al b)ag +--+ + (at b)am. 


Note that if W = R”, the projection is just the identity map with P = I. 
For every b e R”, its orthogonal projection onto W is 


T T T 
a; b as b az b 
1 2 
b= 701 +54 + + 7 Am, 
ay Q1 a3 a2 Anam 


which is a orthogonal decomposition of b along each of the vectors in the 
orthogonal basis S = {a1, a2, -+> , Gm}. 

A natural question next is how to find orthogonal and orthonormal basis 
from a given basis? The Gram-Schmidt process is a procedure for this purpose. 
Before we detail the process of creating orthogonal basis, let us first observe 
that a set of nonzero orthogonal vectors S = {a1, a2, :::, am) in R” are 
linearly independent. Indeed, consider the vector equation 


C1a1 + C202 + :**Cmdm = 0. 


Multiply both sides by af, i = 1, 2, --- ,m. We have 


T T 
a; (C1a1 + 0242 +-:: + Cmam) = 0 > cia; a; = 0, 


which leads to c; = 0 since a; is nonzero. That is 


Lemma 4.4.2. If S = {a1, az, --: , amp is a set of orthogonal nonzero 


vectors in R”, then S is linearly independent. 


Gram-Schmidt orthogonalization process 


Let W be a subspace of R” with an orthogonal basis S = {a1, a2, ::: , Gm}. 
We want to convert it into a new basis S’ = {b1, ba, --- , bm}. The idea is 
that for every i = 1, 2,--- ,m and beginning with bı = aı, use orthogonal 


projection to find b;+ı such that 


bi A span{ay, Q2,°°** , ai), 


94 
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and that 


spanía;, Go, +, Gia) = span{b,, ba, PESU bist}. 


When the process terminates, we obtain a set of orthogonal basis S’ = 
[b1, ba, :::, bm} for W. To be specific, we carry out the following steps. 


1) 
2) 


Let bı = 01; 


II 


Find an orthogonal decomposition of az with respect to span{b;} 
span{aı} and let bz be the component of az orthogonal to span{bı} = 
span{ay}: 

bib? bTaz 
oro BT by 


Then ba + 0 since ag € span{a,} = span{b,;}. Then (bi, b2} is linearly 
independent since ba L span{b;}. Moreover the orthogonal set {b1, b2} 
is contained in the two-dimensional space span{a,, a2}. Therefore, we 
have 


bı. 


b2 = a2 — 


span{aı, a2} = span{by, ba). 


Find an orthogonal decomposition of a3 with respect to span{b1, b2} = 
span{aı, a2} and let bs be the component of az orthogonal to 
span{b1, b2} = span{a1, a2}: 


bib? _ bab? bl as bl a3 


~ “3 ahn BF by 


— —a —— Ga = b . 
bb, d DT > ? 


b3 = a3 


Then b3 + 0 since a3 ¢ spanía;, ao) = span{b,, bo}. Then tb, ba, bs} 
is linearly independent since bz L span{b,, b2}. Moreover the or- 
thogonal set (b,, b2, b3} is contained in the three-dimensional space 
span{aj, az, a3}. Therefore, we have 


spanta;, a2, a3} = span{bi, ba, b3). 
Successively find an orthogonal decomposition of ama with respect to 
span{b1, ba, rii bm) = span{a1, MW: am}, 


and let by be the component of ama which is orthogonal to 
span{by, ba, Ta? bm}: 


bib? bab? din bT, 
+1 =Am+1 Sb +1 dy" +1 OE Bin +1 ( ) 
bP ami bi Om+1 bT Am+1 
E y A A AS 
mti o O A bT bm (4-8) 


Then we have bm+1 + 0 since 


am+1 £ span{a1, a2, -+> ,Gm) = span{by, ba, --- , bm}. 
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We obtain that (b,, ba, -:- , bm} is linearly independent since 


bm+1 AL, span{by, ba, nee Dri ks 


Moreover the orthogonal set {b1, b2, :-: , bm+1 } is contained in the m+1- 
dimensional space span{bj, ba, -++ ,bm+1}. Therefore, we have 
span{a1, GQ, am+1} = span{hy, ba, PREA ,bm+1}- 


Example 4.4.3. Let W = span{w1, w2, w3} be a subspace of R3, where 
w1 = (1, —1, 0), wa = (-2, 3, 1), w3 = (1, 2, 4). 


i) Show that {w1, wa, w3} is linearly independent. Then use the Gram- 
Schmidt process to convert {w1, w2, w3} into an orthogonal basis V = 
Lo1, V2; v3} of W; 


ii) Let u = (1, 1, 2). Find the coordinate vector [u]y relative to the orthog- 
onal basis V obtained in i). 


Solution: i) To show that W is linearly independent, we consider the vector 
equation 
C1W1 + C2W2 + c3w3 = 0, 


which is equivalent to 


1 —2 1 Cy 0 
—1 32 C2 | = 0 
0 1 4 Ca 0 


Next we reduce the coefficient matrix to row echelon form to determine the 
solution for (c1, Ca, C3). 


1 -2 4 1 -2 1 1 -2 1 e 0 
-1 3 a 1 3| Æ> jlo 1 3| = le ls l0 
0 14 0 14 0 O01 ds 0 


That is, the vector equation has only the trivial solution for the coefficients. 
Therefore W is linearly independent. 
Now we use the Gram-Schmidt process to obtain an orthogonal basis: 


vi SU, = (1, =], 0); 


vw U + Wa 


2 

V2 =W2 — FU = Wa — U1 

vlv lol? 

(—5)(1, 1,0) (1 1 
=(—2, 3, 1) ES 5 FE 9 >» 1 ; 

A 2 vi ws va ws ij V1 a v2 W3 
3 — LS SS = 3... pee 2 

wy evs E" Teal? 
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2G E IUL 150) ES | al 


2 3/2 
Then {v1, v2, v3} is an orthogonal basis of W. 
ii) Since {v1, va, v3} is an orthogonal basis, we have 


T T. T 
J 2 


viu U3 U 
u = Du + m— U2 + TT U3 
Uz UL Ug V2 Uz V3 


Therefore, we have 


luv = ( viu vu +“) = (E vı) (u, vz) (u, v3) 


ului UZ V2" UZ U3 lall? ° 07 © [us 
o 
Remark 4.4.4. (QR decomposition) Let A = [uz : uz : -++ : um] be an 


n x m matrix with m linearly independent vectors. Then using the Gram- 
Schmidt process the columns of A can be converted into an orthonormal set 


of vectors which are columns of Q = [q1 : q2 : -++ : qm]. Then the transition 
matrix from basis (u1,u2,::: , Um} to 4q1,q2,*** ,Qqmy is 

Un ug UL 

ul qa ul qa di ud 

UL Gm U3 qm să up, dm 


By the Gram-Schmidt process we know that u? q; = 0 for every i < j because 


qj L span{u1, u2, -++ „uj-a). Therefore R is a upper triangular matrix with 
ut qi ul qi Eso ud 
E 0 ug o ug 
0 0 aro ul qm 


That is, A = QR and A is decomposed into the product of an orthogonal 
matrix and an upper triangular matrix. D 


Example 4.4.5. Find the QR decomposition of the matrix 
[o —1 
1 1 
A= |1 0 0 = [c1 ca cs], 
1 
0 


0 
0 


where c1, ca and cz are the columns of A. 
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Solution: The matrix contains a block of the identity matrix 13. Therefore 
we have rank(A) = 3. Then the dimension of the column space of A is 3 and 
the columns of A form a basis. Applying the Gram-Schmidt process to the 
basis c1, C2 and c3 of the column space we have 


B1 = Ci, 
T 
Ci By 1 
= co — - 8, = =(—2, 1, —1, 2, 0), 
b2 2 BT By By al ) 
T T 
C3 Bo c3 By 1 
= 03 — - B2 — —— Bi = =(7, —6, 6,13, 5). 
b3 = ¢3 BI By Ba BT By By El ) 
Normalization on (61, G2, P3) gives 
1 
a = zO 1, 1, 0, 0), 
1 
q2 = Tu 1, =1, 2, 0), 
1 
3 = ane —6, 6, 13, 5). 
Then the matrix Q is 
o 52 T 
y m YE 
Q 2 10 315 
o >= 13 
V10 315 
0 0 315 
and the matrix R is 
v2 _5v2 
in on om v2 a 2 
R= 0 Eq c3 q2 = o 2 = 
0 0 393 0 o Y 
V5 
Therefore we have 
o —>2 T 
Oy a. Se 1 pe Vg [ v2 v2 iz: al 
1 1 —5 V2 Vo 315 2 2 
A-|1 0 OE aa Cee 6 0 v0 __13 | 
Va 10 315 2 V10 
O 1 0 aa A 
0 0 1 0 0 5 V5 
315 


Exercise 4.4.6. 
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x 


FIGURE 4.3: Hz is the mirror reflection of x with respect to the hyperplane 
S with unit normal u. 


1. Let W = span{w1, wa, w3} be a subspace of Rt, where 
MAA 3, 1), ws = (1, 2, 4, 0). 


i) Show that {w1, wa, w3} is linearly independent. Then use the Gram- 
Schmidt process to convert {w1, W2, w3} into an orthogonal basis V = 
{v1, v2, v3} of W. ii) Find the coordinate vector [u]y of u = (1, 0, 2, 0) rela- 
tive to the orthogonal basis V. 


2. Show that there exists an m x n, n < m matrix A such that ATA = In 
but AAT + Im, where J, and Im are n x n and m x m identity matrices, 
respectively. 


3. Show that every finite dimensional inner product space has an orthonormal 
basis. 


—1 1 1 
1 — 1 1 . oe bids 28 
4. Let A= 1 rel HE Find a QR decomposition of A, if it exists. 
1 1 1 -1 
—1 1 
5. Let A= f E . Find a QR decomposition of A, if it exists. 
1 1 


Orthogonality 99 


6. Let A = [c1 : cg: +++: Cm] be an n x m matrix with m linearly independent 
vectors. Suppose that the Gram-Schmidt process converted the columns of A 
into an orthogonal set of vectors which are columns of B = [bi : b2 : +++: by]. 
Find the matrix R such that 

A=BR. 


7. Let S = (1, x, 2?) be a basis for Py. Define inner product on Pa by 


(f, 9) = L f(x)g(x)dz. 


If we replace the dot product in the Gram-Schmidt process with inner product, 
can you convert S into an orthogonal basis for Pa? 


8. Let u € R” be a unit vector. What is the representation matrix H of the 
mirror reflection about the plane orthogonal to u? Is H an orthogonal matrix? 
See Figure 4.3. 


9. Let v be a unit vector in R”. Show that the matrix P = I — 2vv” is an 
orthogonal and symmetric matrix. We call P = I — 2wvf with vlv = 1a 
Householder matrix and call the map z — Px, x € R” a Householder 
transformation. 


10. Let x, y € R” be such that x + y and ||æ|| = |||. Let u = ToT and 
P =I — 2uu". 

i) Show that Pr = y and Py = z. 

ii) Show that for every z € R”, y = +||x||e1, where e is the first vector of the 


standard basis of R”, there exists a Householder matrix P such that Px = y. 


11. Let x = (3, 4, 0, 0) € R* and y = (5, 0, 0, 0) € Rt. Find a Householder 
matrix such that Px = y. 


1 —1 0 
12. Let A = |1 1 0|. Use the Householder transformation to find an 
0 1 1 


orthogonal matrix Q such that QTA is upper triangular. 


13. Let A be an m x n matrix with linearly independent columns. Show that 
there exists a sequence of orthogonal matrices P4, Po, --: , Pm such that 


PmPm-1' PPLA = R 
is upper triangular. 


14. Explain that a product of symmetric orthogonal matrices is orthogonal 
but may not be symmetric. 


Taylor & Francis 
Taylor & Francis Group 
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5.1 Introduction to determinants 


We know that the 2 x 2 matrix A = E 


scalar ad— bc + 0. And we also learned that elementary row operations do not 
change invertibility of a matrix. If we regard the scalar ad — bc as the value of 
a function, called determinant det acting on the matrix A, we wish to know 
how elementary row operations will change the function value. 


A is invertible if and only if the 


1) Multiply a row of A by a constant t. Then 


des (e al) -tad ha 
c d 


=t(ad — bc) 
=tdet(A). 
Namely, det is a homogeneous function of its rows. Moreover, let Ej = 
t 0 
. We have 


0 1 
det (E A) = det( E1) det(A). 


2) Add a multiple of one row to another. Then 


a b 
det (o, oe A al) = a(tb+d)—b(ta+c) = tab+ad—tab—bc = det(A). 
That is, adding a multiple of one row to another does not change the 
determinant. Moreover, let E2 = i if We have 


det (EA) = det( E2) det(A). 
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3) Interchange two rows. Then 


c d 
det (|; J) = bc — ad = — det(A). 
That is, interchanging two rows changes the sign of the determinant. 


Let E3 = f i We have 


det (EA) = det(Ez3) det(A). 


Items 1) and 2) hinted that det may be a linear function of the rows of a 
matrix. Indeed, 


de (07 Du) =d(a + e) — c(b + f) =ad—be + ed — fe 


callo aj); al) 


det Qs 2) =t det(A). 


Let Mnn be the vector space of n x n matrices. We want to define a function 
det : Mnn — R called a determinant which satisfies the following properties. 


Q1 
a2 
P1) det is linear in each of the rows of A € Mnn. For every A = |. | in 
an 
Mans f € R”, te R, 

eb A O E A E 

a2 a2 a2 a2 ag 

det deded = det + det FA det taal 7 tdet : 

Un An an an an 


ai ay 
a2 a2 
Qj Ai 
det | . | =—det| . 
Ai Qj 


an an 
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P3) Let I be the identity matrix. 
det (1) = 1. 


Determinant of permutation matrices 


A permutation matrix P is a matrix obtained by interchanging rows of the 
identity matrix I, and we know that PT = P-t. By property P2) and P3), 
we know that 

det(P) ile 
The sign of det(P) is determined by the number of interchangings (or called 
transpositions) of the rows of I in order to obtain P. If the number is 
odd, det(P) = —1; if it is even, det(P) = 1. To determine the number of 
transpositions, we assume that the rows of P are a permutation of the row 


of I, which is an ordering of the numbers {1, 2,--- ,n}. Let S denote the 
set of all permutations of the set of numbers {1, 2, --- ,n}. Every member 
o =(0(1),0(2),--- ,0(n)) € S can be represented as 


fı 2 m 
TOE om)” 
where the first row of o denotes the row number of J and the value of a(i) 


indicates that the current i-th row of P was the o(î)-th row of I. For example, 
the following representation of the permutation o, 


yl E 2 3 4 i) 
5 3 2 1 4)’ 
implies that the current first row of P was the fifth row of J, the current 
second row of P was the third row of J, the current third row of P was the 
second row of J, and so on. 
Now we count how many transpositions to achieve o from the (12---n). 


We use the so-called total inversions 7(0) in o which is the cardinality of all 
the occurrences that o(j) < a(i) with j >i. That is, 


T(0) = > number of (i, 7)’s with a(i) > o(j) andi < j. 
i=1 
For example 7(53214) = 4+2+1+0+0 = 7, 7(13542) = 0+14+2+14+0=4. 
It turns out that 7(0) is the minimal number of transpositions to achieve o 
from the (12---n). Then by Property P3), we have 


Lemma 5.1.1. 


det(P) = (-1)", 


where o is the permutation of the rows of I to obtain the permutation 
matrix P and r(c) is the total inversions of o. 
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Example 5.1.2. 
1) Let 


0 1 0 
P= |1 0 0 
0 0 1 


Then P is a permutation matrix obtained from I by the permutation o = (213) 


with representation o = : A J and with 7(0) = 7(213) =1+0+0=1. 
Then det(P) = (—1)! = —1. 
2) Let 

[o 100 ol 

000 1 0 

P=|1 00 0 0 
00001 
001 0 0 


Then P is a permutation matrix obtained from I by the permutation 


123 4 5 = 
o = (24153) with representation o = 2415 3) and with T(0) = 
7(24153) =1+2+0+1+0=4. Then det(P) = (-1)? = 1. o 


Before we discuss determinants of general n x n matrices, let us count the 
inversions of the inverse o! of a permutation o € S, which is the permutation 
which restores the permutation o into identity id. That is, 07! oo = id. For 
example, 


(123.45 zine 1 2 3 4 5 
“Hs 3 2 1 4) Toi) 02) o-1(3) o-1(4) 07.45) 
(12345 

NA Be OQ Dal) 
By examining the matrix representation of permutations, we have 
T(0) = number of {(i, j) : oli) > 0(7) with i < j} 
= number of [(07+(s), 07+(t)) : s >t with os) < o~! (t)} 


number of {(s, t) : s > t with 07*(s) < a7 '(t)} 


= T(07?). 


In summary we have 


Lemma 5.1.3. Let o be a permutation of the set {1, 2, --- ,n}. Then 
we have 


r(0) = (o), 


where 7(0) is the total of the inversions of o. 
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Determinants of n x n matrices 


Let A = (aj) be an n x n matrix 
{e1, €2, --- 


with rows a, a2, 
,€n ) be the standard basis of R”. Then we have 
aL =ane! — 


T T 
F aie +++ Fine, 


T 
a2 =021€] 1 


T: 
H az2e3 ++ 


F 
r d2n En y) 


T T 
An =ûn1€1 ae An2€2 ape Saar Annen 


By linearity property of determinants, we have 


T T T 
411€] 41262 GinEn 
a2 a2 a2 
det(A) = det y + det , +++. + det , 
an an an 
T T T 
ei €z En 
a2 a2 a2 
=011 det 2 + 0412 det ++:**+Qin det 
an an An 
T 
eki 
n ao 
= > 01k; det | 
ki=1 : 
An 
T 
“hi 
n n 
Cko 
= > Qik, j 02ko det A 
ky=1 ko=1 $ 
an 


kn 
e 
2 
= J > ny > Alk aka == Anka det 
ki=1 k2=1 kn=1 
ek 


= 5 0415(1)425(2) *** Ano(n) (—1) 


7(0) 
bi 
ces 


* 3 An. 
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where the last step is based on the observation that, by P2), 


since switching the identical two rows will not change the matrix but will 
reverse the sign of the determinant. Therefore only the permutation matrices 
survive to have nonzero determinants in the summation. That is, the determi- 
nant of an n x n matrix is the sum of every such signed product of n entries 
of A that contains a unique entry from each row and each column. Therefore, 
we have 


Theorem 5.1.4. If det : Mnn — R satisfies properties P1), P2) and 
P3), then its value is uniquely determined by 


det(A) = y (-1)" aro(1)820(2) “**Gng(n)» 


TES 


where S is the set of all permutations of the set (1, 2, --- ,n} and T(0) 
is the inversion of o. 


An immediate consequence is that 


Corollary 5.1.5. 


i) If AE Man has a zero row or zero column, then det(A) = 0. 


ii) If A = (a,;) is a triangular matrix, then det(A) = 411422 `` ` ann- 


We leave it as an exercise to show that the function f : Mnn — R defined 
by 


f(A) = SC a17(1)420(2) ***(mo(n) 
ces 


must satisfy properties P1), P2) and P3). 
Exercise 5.1.6. 
1. Let A be an n x n matrix. If A has two proportional rows, then det(A) = 0. 


2. Let S be the set of all permutations of the numbers (1, 2, 3,--- , n). We call 
a permutation an even (odd) permutation if its total inversion is even (odd, 
respectively). Show that there are the same number of odd permutations and 
even permutations. 


3. Let S be the set of all permutations of the numbers (1, 2, 3,--- , n) and 
o € S. Show that every transposition on o changes its parity. 
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4. Find the total inversion, parity and matrix representation of each of the 
following permutations 


i) (123564); 
ii) (456213); 
iii) (123456). 


5. Let A be a 5 x 5 matrix. Find the coefficient of each the following products 
in the permutation expansion of det(A). 


i) 1524433042451; 


ii) @13425031442454; 
iii) a14493432045051; 


IV) 012023034045051- 


6. Find the coefficients of the terms of x? and x* for the polynomial f defined 
by 


x 2 x 1 
1 3r 1 —1 
f(x) = det 23 lz 2 
3 L 2 x 


0 1 0 0 0 0 1 0 0 
0 0 2 0 0 0 0 2 0 
A= 0 , B= : : n—2 
0 0 0 n-—1 n-1 0 0 0 0 
n 0 0 0 0 n 0 0 0 


8. Let A be an n x n matrix such that 


1 ifi+j=n+1 
Qij = 
E O otherwise. 
Find det(A). 


9. Let A be an nxn matrix. Show that if A has more than n?—n zero entries, 
then det(A) = 0. 


10. Show that the function f : Mnn — R defined by 


f(A) = SC a17(1)420(2) “++ Ano(n) 


ves 


must satisfy properties P1), P2) and P3). 
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5.2 Properties of determinants 


We know that for 2 x 2 matrix A, det(A) = det(A7). Indeed it is valid for 
A € Man. 


Theorem 5.2.1. det(A) = det(47). 


Proof. We have 


det (A) = Y (=1)7  a19(1)420(2) t Ano(n)- 
ces 


Consider 417(1)424(2) *** @no(n), Where o € S is the column permutation with 
the row permutation fixed with (12---n). If the column permutation is fixed 
with (12---n), the row permutation is exactly 07*. By Lemma 5.1.3, we have 
r(o) = r(o-1). Namely, the total of the inversions created by o is restored by 
ao! to 0. Hencer(0) = r(o-1). Therefore, one can fix the column permutation 
to (12---n) and obtain that 


det(A) = A Oas) 105-1(2)2*** %g-—1(n)n 


ces 
r(o7l 
=X (y a 12) AL ¿1 (n) 
ces 
r(o”! 
> EDO o 0-12)" An 07m) 
ates 
= det(A*) 


O 


A significant consequence of Theorem 5.2.1 is that the properties of de- 
terminants described at P1) and P2) with respect to rows are also valid in 
terms of columns. Now we use Theorem 5.2.1 to establish the co-factor formula 
which can be used to compute determinants. 

Let A = (aj) be an nx n matrix with rows a1, 42, -** „an. Let Mj; denote 
the submatrix of A with the ¿-th row and the j-th column deleted. We have 


T T T 
Qi = Qile + Qi2€3 +::: + ainen 
and 


ay ay ai 


det(A) = det aie! + det ages +.---+ det dines, 
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ay ay ay 
=a; det | Je? | | + ajo det | [e | | +---+aindet | [ez 


Un an an 


By means of interchanging rows and columns, we can manage to put the entry 
1 of er from row ¿ into the (n, n)-position, using (n—i)+(n—j) interchangings. 
Then by property 2), and noticing (-1)"=++49=3 = (-1)+3, we have 


Mi x 


det(A) =a (1) det | i Mij i 


| +- + aij (—1)™ det | 0 1 
0 1 

=a; (1) +! det Mii +--+ aij(—1) det Mij 

Qin(—1)**” det Min. 


+- + ain(—1)'*” det | 


Theorem 5.2.2. (co-factor expansion) Let A = (a;;) be an n x n 
matrix and M;; be the submatrix of A after the i-th row and the j-th 
column of A are deleted. Then 


det(A) = Qil (=p det Mi ape ap aij(—1)**7 det Mij 
F + amn adet Min. 


Define C;; = (-1)9Y det Mij. We call C;j the co-factor associated with 
the entry aij. We leave it as an exercise to show that co-factor expansion is 
also valid in terms of columns. 


1 2 3 
Example 5.2.3. Let A= |4 5 6|. By the co-factor expansion formula, 
7 8 9 


we have 


In principle, one can use the co-factor formula to compute the determinant 
of every n x n matrix, by recursively reducing the sizes of the submatrices 
involved in the co-factor formula. However, the amount of computation is 
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huge when n is large. In fact, elementary row operations also play important 
roles in the computation of determinants. 


Theorem 5.2.4. Let A = (a;;) be an n x n matrix. If E is an elemen- 
tary matrix, then 


det(E-A) = det(E) det(A). 


Proof. We distinguish three cases: 


i) If E denotes the row operation of interchanging two rows, say, row i inter- 
changed with row j (i > j), then E is a permutation matrix which is obtained 
by interchanging row i with row j of I, involving 2(j — i) — 1 transpositions. 
Therefore 

det(E) = (-1)20-9=1 = —1. 


On the other hand, by Property 2), we have det(EA) = —det(A) = 
(—1) det(4). That is, det( EA) = det (E) det(A). 

ii) If E denotes the row operation of adding r-multiple of one row to another, 
then E is a triangular matrix with 1 in every main diagonal position. By 
Corollary 5.1.5, we have det(E) = 1. On the other hand, by Property 1), we 
have 


det(E A) = det |a; + ra;¿| = det |a;| +rdet |a;|, 


where the displayed a; is in the i-th row. Then det | a; | = 0 since interchang- 


ing its i-th row and the j-th row does not change the matrix, but by Property 
3), the determinant has to change sign. Then we have det(E A) = det(A). 
That is, det(E A) = det(E) det(A). 

iii) If E denotes the row operation of multiplying a row by a constant cz 0, 
say, row i multiplied by c, then FE is a diagonal matrix with 1 in every main di- 
agonal position except for the c at the (i, ¿)-position. Then by Corollary 5.1.5, 
det(E) = c. On the other hand, by Property 2), we have det(E-A) = cdet(A). 
That is, det(H A) = det(E) det(A). O 


Corollary 5.2.5. Let A = (a;;) be an n x n matrix. If two rows of A 


are proportional, then 
det(A) =0. 
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Theorem 5.2.6. A is invertible if and only if det(A) + 0. 


Proof. “=>” If A is invertible, then by Theorem 2.3.4, A is a product of ele- 
mentary matrices. That is, there exist elementary matrices Ej, E2, +++ , Ex 
such that A = EpfEx-1::*E,. By Theorem 5.2.7, we have det(A) = 
det (F;,)) det(£,_1)-+-det(£1) which is nonzero because the determinant of 
every elementary matrix is nonzero. 

“<—” Suppose not. That is, A is not invertible, then by Theorem 2.3.4 
Ax = 0 has nontrivial solutions. Hence the reduced row echelon form U 
of A has at least one zero row. That is, there exist elementary matrices 
En, Ez, --- , Ex such that Ey, E,_1---£,A =U. We have 


det (Ey Exo: ELA) = det (Ep) det(Ex-1) +++ det(E,) det(A) = det(U) = 0, 


which lead to det(A) = 0 since det(£;) + 0, i = 1, 2,--- , k. This is a contra- 
diction. o 


Theorem 5.2.7. Let A and B be n x n matrices. Then 


det( AB) = det(A) det(B). 


Proof. If A is not invertible, then by Theorem 2.3.6, AB is not invertible. 
By Theorem 5.2.7, det(AB) = 0 and det(A) = 0. That is, det(AB) = 
det(A) det(B). 

If A is invertible, then by Theorem 2.3.4, A is a product of elementary 
matrices. That is, there exist elementary matrices Ej, Ez, -:: , Ex such that 
A = EE, E,_1::: En. By Theorem 5.2.7, we have 


det(AB) = det (FE, Ex-1 ... Eı B) 


= det( Ep) det(Ex-1) -- - det(E1) det(B) 
= det(A) det(B). 


Example 5.2.8. Compute the determinants of the following matrices 


a) A= [46159 46059]. 
= [70281 70181] ° 


2 


N N N 
NNA 
NANN 
ANN 
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Solution: a) Subtracting the second column from the first column we have 


100 46059 


det(A) = det fioo Zonet] = 100(70181 — 46059) = 2412200, 


b) Adding the second, the third and the fourth row to the first row, we 
get a row of 10's without changing the determinant. 


10 10 10 10) 


det(B) = det 9 ; ; : (Add row 2, 3, 4 to row 1) 
2 2 2 4 
22 2 2 
2 4 2 2 
=5 - det 2 2 4 2 (Factor 5 from row 1) 
222 4 
22 2 2 
0 2 0 0 
=5 - det 0020 (Subtract row 1 from row 2, 3, 4) 
0 0 0 2 
=5-2-2-2-2 (Determinant of a triangular matrix) 
=80. 


Since AA”! = I, by Theorem 5.2.7 we have 


Corollary 5.2.9. If A is an invertible matrix, then 


= 1 
~ det(A) 


det An”) 


Another application of Theorem 5.2.7 is the Cramer’s rule. Consider Ax = 
b, where A is an n x n invertible matrix and z, b € R”. Let {e1, e2,--- , en} 
be the standard basis of R”. Since Ax = b is consistent, the following matrix 
product holds: 


Alea +++: ein i Z: Cipi tee en] = a: iai: bt aiga i++ Ap]. (5.1) 


Since the i-th coordinate of x is the only nontrivial entry in the i-th row of 
[ey : +++ €i—1 | £: ia 2 +++! en], we have by co-factor expansion 


detlez : --:: @;-1:U: iq. 30000 Cn] = Ti. 


Let B; denote the matrix obtained by replacing the i-th column of A with b. 
Equality (5.1) can be rewritten as 


Alea :+++:@j-1 2 ieste] = Bi, 
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which by Theorem 5.2.7 leads to 
det(A)z; = det(B;). 


In summary, we have 


Corollary 5.2.10. (Cramer's rule) If A is an invertible matrix, then 
the solution of Ax = b for x= (11, £2, +++ ,Tn) is 


= det(B.) = det (Bz) = det(B,,) 


SS A 


det(A)’*?~ dea "n AE 


where B;, i = 1,2,--- ,n, is the matrix obtained by replacing the i-th 
column of A with b. 


Example 5.2.11. Solve the following system of equations using Cramer's 
rule. 


2x —y+3z=0 
x+4y+2z=1 
32+ 2y+2=2. 


Solution: The system of linear equations can be written as 


2 —1 3] |z 0 
1 4 2) Jy} =] 
3 2 1] lz 2 
We note that 
2 -1 3 
det |1 4 2| =-35%0. 
3 2 1 
Therefore, Cramer’s rule applies. By Cramer’s rule we have 
0 -1 3 2 0 3 
det |1 4 2 det |1 1 2 
2 2 1 —21 3 2 1 —9 
E = == 018 >= =e 
2 -1 3 35 2 -1 3 -35 
det |1 4 2 det |1 4 2 
3 2 1 3 2 1 
and 
2 -1 0 
det |1 4 1 
_ 3 2 2] 1H 
a fe 3] 35 
det |1 4 2 
3 2 1 
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That is, the solution is 


j- (21 9 u 
T deel = \ 35) 35? 35)' 


We close this section with an application of the co-factor formula and 
Theorem 5.2.7. Consider A € Mnn with rows a1, a2, ::: ,Op,::: ,n. We have 
for every i = 1, 2,:::,n, 


det(A) = a; Ci + ai2Ci2 +-+- + ainCin 


It is interesting to examine aj Ci + aj2Cio +--+ + ajnCim as well. It turns out 
that aj Ci + aj2Cig +--+ + ajnCim is a cofactor expansion of the determinant 
of such a matrix that is NOT A, but is the matrix obtained by replacing the 
i-th row of A with its j-th row. Such a matrix has determinant zero because 
it has two identical rows. Therefore we have 


det(A) ifi= j, 
aj Ci + aj2Ci2 +:** + ajnCin = a i 


0 ifizj 
or equivalently 
[Cu Co Chani 
Cia Co Che 
i = det(A)I. 
Cin Con KEN Cnn 


Definition 5.2.12. We call the transpose of the matrix of cofactors of A the 
adjoint of A, denoted by adj(A). 


Theorem 5.2.13. Let A be an n x n matrix. Then 
Aadj(A) = det(A)I. 


In particular, if A is invertible, then 


i adj(A) 
i det(A)’ 


Exercise 5.2.14. 


1. Let 


Find det(A) by the following three methods: 1) Co-factor expansion; 2) Ele- 
mentary row operations; 3) The permutation formula for determinants. 


Determinants 


2. Find the determinants of the following matrices. 


—2 


3. Find the determinants of the following matrices. 


1 
3 
1) 4A= |; 
K 
[o 
1 
3) C=], 
0 


N DUO ON OW 


4. Find the determinants of the following matrices. 


Qa SR NAWU 


ONN > RN OW 


3 5 6 
5 7]; 2) B=|1 
7 9 8 
0 0 0 
5 01; 4) D=]0 
-1 7 8 
5 7 0 
7 9 1 
9 11’ a) = 8 
11 13 —1 
2 ol [o 
0 5 4) De 0 
4 0? ~ 10 
5 0 4 
5 7 5 
7 1 1 
a gle Bla 
5 3 1 
c d x 
d c o ly 
a bl? 4) D= i 
b a y 


e e RE Ea p T 


O OO won 


mow oN 


ork NW © 


e Ree FoR — 
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EON FO WD 


oe © N 
O ou 


5. If A is an n x n matrix with det(A) = 2: i) Find det(adj(A)); ii) If 
(adj(4));; = 1, which entry of A~' do you know the explicit value of? 


6. Show that if A is symmetric, then adj(A) is also symmetric. 


7. Show that if det(A) = 0, then det(adj(A)) = 0. (Hint: Note that 


adj(A)A = Aadj(A) = det(A)I, 


and show that adj(A) has a nontrivial null space if det(A) 


a 
8. Let A = ae 


the vectors 0, (a b), (c, d) and (a+ c, b + d) is | det(A)|. 


0.) 


i . Show that the area of the parallelogram determined by 
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9. Let 
1 a a ot 
he | z2 al ac Să 
L ip. EE ies 


Find det(A). 


10. Let n > 2, n € N and A, be an n x n tridiagonal matrix given by 


3 2 0 

13 2 

0.1 3 
An = 

0 0 

0 0 


Compute D,, where Dn = det(A,). 


11. Let x = (211, £2, °°: 
I+ yz”. Find det(A). 


0 
0 
2 
1 3 
0 1 


In) E R” andy = (1,1,--- 


,1) € R”. Let A = 


12. Let x=(1,1,---,1) € R”. Let A = za? — I. Find det(A) and AL. 


13. Let det(A) = a, det(B) = b. Compute 


AC 
det E A i 
14. Use Cramer's rule to solve the following systems, if applicable. 
1) 
x+2y+3z=1 
z+3y=1 
y+2=1l; 
2) 
2y+3z=1 
r+3y=-l1 
z+y+z=1 
3) 
z+2y+3z=1l 
22+ 3y+4z=1 
5x + 3y + 4z = —-1; 


? 
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z—2y+3z=1 
x + 3y + 4z = —1 
y+z=1. 


15. Let A = (a;¿) be an n x n matrix and M;,; be the submatrix of A after 
the i-th row and the j-th column of A are deleted. Then 


det (A) 
=aj;;(—1)'*9 det Mi; +: + aj (=p det Mji +: + ani(—1)" det Mni. 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 
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6.1 Introduction to eigenvectors and eigenvalues 


There are many occasions that we model the input x, at the discrete 
instance k € N and the output yz at instance k with a linear relation 


Yk = AZ», 


where A is an n x n matrix. Then the output at any instant can be predicted 
by y. = A*x1, and sometimes we are even interested in the existence of the 
limit limz-+400 Yk. It is usually time consuming to compute AF when k and 
the dimension n of A are large. However, if A is diagonal, then A* can be 
easily computed. Indeed, we have 


A 0 di 0 

A dk 

A= 7 and Ak = a 
0 An 0 di, 


Certainly not every matrix is diagonal. The question is how to reduce a non- 
diagonal matrix into a diagonal one. Since we will compute successive multi- 
plication of A, we assume there exists an invertible matrix P such that 


A= PDP, 
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where D is diagonal. Then we have 
A? = (P-IDP)(P-1DP) = P-1D2P, A® = P-1D*P. 


This means that if we are able to find an invertible matrix P such that 
A = PDP, we can easily compute A* with 4* = P-*D*P. Now the ques- 
tion is transformed into finding a decomposition of A with A = P-1DP, 
where D is diagonal. To be more specific, let P = [pi : p2 : ++: : Pnl, 
D=diag[A1, \2, +++ , An}. Then A = P-1DP is equivalent to 


Alpi : p2::::: Pn] = [Apus A2D2, +++ , AnPn]. 


Namely, to achieve A = P~!DP, we need to find the scalar-vector pairs 
(Ai, pi), 7 = 1,2,---,n such that 


Api = Api, 
and p;, i = 1, 2, --- ,n are linearly independent. 


Definition 6.1.1. Let A be an n x n matrix. If there exists a nonzero 
x € C” such that 
Ag = Az 


for some A € C, we call x + 0 an eigenvector of A and A the 
eigenvalue of A corresponding to x. If there are n linearly indepen- 
dent eigenvectors pı, Pa, -:: ,Pn of A with corresponding eigenvalues 
A1, A2, +++ , An, then the matrix P = [pı : p2 : ++- : pn] is such that 


A 0 
A 
jE=3 p= 
0 


and we say that A is diagonalizable. 


Now we develop methods for computing eigenvalues of A. Notice from the 
definition of eigenvectors that every eigenvector z is nonzero and satisfies 


Ar = àx & (A-ADx=0, 


which implies that A — XJ is not invertible. Then it is necessary that every 
eigenvalue A of A satisfies 


det(A — AI) =0, 


which is a n-th degree polynomial of A and which has n roots in the complex 
domain C. Conversely, if A € C satisfies det(A — AI) = 0, then A — AI is not 
invertible and there exists nonzero x which satisfies 


(A —AI)z = 0 & Ar = Ax. 
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Theorem 6.1.2. Let A be an n x n matrix. A € C is an eigenvalue of 
A if and only if 


det(A — AL) =0. 


For every n x n matrix A, we call det(A — AI) the characteristic poly- 
nomial of A, and call N(A — AI) the eigenspace of A corresponding to the 
eigenvalue A. 


Example 6.1.3. Let A be the following matrix: 


-1 0 1 
A=|-6 2 3 
0 0 1 


i) Find the eigenvalues A1, A2, Ag of A with A < A2 < As; 


ii) Find an eigenvector associated to each eigenvalue obtained in i) with 1 
as its first non-zero component; 


iii) Determine whether A is diagonalizable or not. If yes, find a matrix P 
such that P-LAP is diagonal. 


Solution: i) We solve the characteristic equation det(AJ — A) = 0 for the 
eigenvalues. That is 


A+1 0 1 
det 6 A-2  -3| =0S(A+1)(A-1)(A- 2) =0 
0 0 ed 


A = —1, A. = 1, Az = 2. 


ii) We solve the homogeneous system (AJ—A)a = 0 with x = (#1, £2, £3) € 
R for an eigenvector corresponding to every A € [A1, Az, Az). 


e For À = A; = —1, we have 
0 0 —1] [a 0 1 -3 0] [a 0 
6 —3 —3| |r| =|0| = |0 O 1| |z2| = |0 
0 0 —2| |23 0 0 0 O| [23 0 
1 
>a + (=3)02=0, 208 =0 
Tı st 
>r= |zr2| =|t|,tER 
T3 0 


Putting t = 2 in z, we obtain an eigenvector associated with the eigenvalue 
A: 


1 
pi = |2 
0 
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e For A = A2= 1, we have 


2 0 —1] [a 0 1 0 —3] [a 0 
6 —1 —3| |z]=|0|=10 -1 0] [22] = |0 
0 0 0] [as 0 0 0 Of [z3 0 


Tı st 
>x= |x%2| = | 0 „teR 
T3 t 


Putting t = 2 in x, we obtain an eigenvector associated with the eigenvalue 
A2: 


1 
p2 = |0 
2 
e For A = A3 = 2, we have 
3 0 —1| |x 0 1 0 OF} |x 0 
6 0 —3| |z2| = 10|] >|0 O 1| |z2| = |0 
0 0 1] | x3 0 0 0 0| |z3 0 
= Ii = 0, 73 = 0 
Tı 0 
>x< v2 = t „teR 
T3 0 


Putting t = 1 in z, we obtain an eigenvector associated with the eigenvalue 
As: 


p3 = |1 
0 


iii) Let P = [pi : p2 : pa]. Then det(P) = —2 + 0. Then A has three 
linearly independent eigenvectors pi, p2, p3 and is diagonalizable. We have 


1 1 0 
P=|2 0 1 
0 2 0 
which satisfies that 
—1 0 0 
P-IAP=| 0 1 0 
0 0 2 
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Notice that det(Al — A)h=o = (-1)” det(A). Then by Theorem 6.1.2, we 
have 


Theorem 6.1.4. Let A be an n x n matrix. A is invertible if and only 


if A = 0 is not an eigenvalue of A. 


Example 6.1.5. 


A projection matrix P with P? = P,P? = P is not invertible and A = 0 is 
always an eigenvalue. 


An orthogonal matrix Q with QTQ = I is invertible and A = 0 is not an 
eigenvalue. 


Exercise 6.1.6. 


1. For the following matrices, find the eigenvalues and a basis of the associated 
eigenspaces. 


100 111 110 
A=|0 1 o|,B=]1 1 1|,C=]|0 1 1 
001 111 001 


2. Find a 3 x 3 matrix A whose eigenvalues are —1, 2, 3 and the corresponding 
eigenvectors are columns of 


1 1 0 
P=[c1:co:c3]}= |0 1 1f, 
0 0 1 


respectively. 


3. Find a 3 x 3 matrix A whose eigenvalues are —1, 2, 2 and the corresponding 
eigenvectors are columns of 


1 1 0 
P=le:c2:c3]= |0 1 1], 
0 0 1 


respectively. 


4. Let A be an n x n matrix. Show that if A is an eigenvalue of A and x a 
corresponding eigenvector, then for every k € N, A* is an eigenvalue of AK 
with x a corresponding eigenvector. 


5. Let A be an n x n invertible matrix. Show that if A is an eigenvalue of A 
and x a corresponding eigenvector, then for every k € Z, A* is an eigenvalue 
of A* with x a corresponding eigenvector. 
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6. Let f be a polynomial and A an nxn matrix. Show that if A is an eigenvalue 
of A and x a corresponding eigenvector, then f(A) is an eigenvalue of f(A) 
with x a corresponding eigenvector. 


7. Let f and g be polynomials and A an n x n matrix. Show that 


(f + 9)(A) =f(A) + g(4), 
(f9)(A) =f(A)g(A). 


8. Let A be an n x n matrix. Show that there exists a nonzero polynomial f 
such that f(A) = 0. 


9. Let A and P be n x n matrices and P is invertible. Show that for every 
polynomial f, 
f(P-LAP) = P-If(A)P. 


10. Let A and P be n x n real matrices and P is invertible. Show that there 
exists a A € C such that A + AP is not invertible. 


11. Let A and P be n xn real matrices and P is invertible. Show that ifn € N 
is odd, then there exists a A € R such that A + AP is not invertible. 


12. Let A be an n x n diagonalizable matrix. Show that if every eigenvalue of 
A satisfies |A| < 1, then 

lim det(A*) = 0. 

k—oo 


6.2 Diagonalizability 


We have seen in Section 6.1 that if an n x n matrix A is diagonalizable, 
namely, A has n linearly independent eigenvectors, then the powers of A can 
be easily obtained. However, not every matrix A has n linearly independent 


eigenvectors. For example the matrix J = has an eigenvalue A = 1 


1 1 
0 1 
with algebraic multiplicity 2 as a root of the characteristic polynomial p(A) = 
(A — 1)?. However the null space of A — 1 - is 


N(A — I) = span (l) l 


which is one dimensional. That is, we cannot find two linearly independent 
eigenvectors necessary to diagonalize A. Therefore, A is non-diagonalizable. 
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Definition 6.2.1. Let A be an n x n matrix. Ao € C is an eigenvalue. We 
call the order of the factor A— Ag in the characteristic polynomial det(A — Ag) 
the algebraic multiplicity of Ay. We call the dimension of the nullspace 
N(A — Mol) the geometrical multiplicity of Ap. 


1 

o 1| 
eigenvalue Ay = 1 is 2 because the order of the factor A—1 in the characteristic 
polynomial det(A — Ap) = (A— 1)? is 2. The geometrical multiplicity of the 
eigenvalue Ay = 1 is 1 because 


Example 6.2.2. For the matrix J = | | the algebraic multiplicity of the 


dim N(A — 1) = dimspan (6) =i, 


Theorem 6.2.3. Let A be an n x n matrix. Let à1, A2, +- ,Ax bea 
set of distinct eigenvalues of A. Let Si = {v11, U12,*** , Vin, },S2 = 


[va1, U22,°°° „Vana i AO Sk = (up, UkQ),*“* » pe. jt be bases of the 
nullspaces N(A — A;I), j = 1, 2,--- ,k. Then the union of the bases 
S1 U S2 U--- Sp is linearly independent. 


Proof. We first show that Sı U S2 is linearly independent. Regard S; as a 
matrix of basis vectors contained in S}. Let cj = (c11, C12, ***,C1m;), J = 
1, 2,--- ,k and consider 


Sic, + Seco = 0. (6.1) 


Suppose, for the sake of contradiction, that S1 U Sa is linearly dependent, then 
cı #0 and c2 + 0; otherwise, one of Sı and Sa is linearly dependent which is 
impossible. Multiplying both sides of (6.1) by A, we obtain that 


A Sici + Xo So = 0. 


Multiplying both sides of Sic + Saca = 0 by A, and subtracting it from the 
above equality we have 
(A2 a A )S2c2 = 0, 


which leads to c2 = 0. This is a contradiction. Therefore, S1 U S2 is linearly 
independent. 
Suppose that S1 U S2 U---5S;, j < k is linearly independent. Consider 


Sici + Seco +--+ 4 Sjcj | Sj+1Cj+1 =0. (6.2) 
Multiplying both sides of (6.2) by A, we obtain that 


A15S1C1 + A» Soc i ii ra AjOjCj + Aj+ 1 410541 =0. (6.3) 
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Multiplying both sides of (6.2) by Aj+1 and subtracting it from (6.3) we have 


Oy = Aj Sia + (Aa = Aj+1)8202 +. + (A; = Aj 41) 55 Cj = 0. (6.4) 


Then cy = cg = +++ = cj = O since S1US2U--:$;,j < kis linearly independent. 
Then by (6.2) we have cj = 0 and hence S1 U S2 U---S; U S541 is linearly 
independent. 

By mathematical induction, S1 U S2 U -- -U Sz is linearly independent. o 


An immediate consequence of Theorem 6.2.3 is that 


Corollary 6.2.4. Let A be an nx n matrix. 


i) If A has n distinct eigenvalues, then A is diagonalizable. 


ii) Eigenvectors associated with distinct eigenvalues are linearly inde- 
pendent. 


A necessary and sufficient condition for diagonalizability is the following 


Theorem 6.2.5. Let A be an n x n matrix. Then 


i) for every eigenvalue ào of A, 


algebraic multiplicity of Ay > geometrical multiplicity of Ao. 


ii) A is diagonalizable if and only if for every eigenvalue Ao, 


algebraic multiplicity of Ap = geometrical multiplicity of Ao. 


Proof. i) Let m be the geometrical multiplicity of Ay. Let Si, = 


{v1, va, +++ ,Um} be a basis of N(A — Aol). Then we can extend Sa, into 
a basis (01 : 02 : ++- : Um : Um+1 3 +++: Un} of R”. Then we have 
Alu] : V2 : +++ Um: Um41 |ti Un] 
=|v EP EE AY DI beret Un] Xolmxm B 
1 + UL. . Um = Um+l - 7 Un 0 Cl’ 


where Im is the m x m identity matrix, B is an m x (n — m) matrix and C is 
an (n — m) x (n — m) matrix. 


Let P = [vy : V2: ++- : Um Um+1 +++ Un]. Then we have 
_1 — |rAolIm B 
P Ap =| as 
and 
=e (o — A) B 
P~ (A ADP = | 0 Cea 
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Then we have 


det(A — AT) =det((Ao — A) Im) det(C — Alín—»m)) 
=(A == Ap)” det(C = Alin vi) 


Therefore, we obtain that 
algebraic multiplicity of Ap > m = geometrical multiplicity of Xo. 


ii) “=>” If A is diagonalizable, then A has n linearly independent eigen- 
vectors. Note that the sum of algebraic multiplicities of all eigenvalues is n. If 
there exists an eigenvalue Ay with algebraic multiplicity of Ay strictly larger 
than the geometrical multiplicity of Ay, then the sum of the dimensions of all 
the eigenspaces is less than n. By Theorem 6.2.3, A has less than n linearly 
independent eigenvectors. This is a contradiction. 

“<—” Tf for every eigenvalue Ao, 


algebraic multiplicity of Ay = geometrical multiplicity of Ay 


then the sum of the dimensions of all the eigenspaces is n. By Theorem 6.2.3, 
A has n linearly independent eigenvectors. A is diagonalizable. o 


Similar matrices 


The idea of diagonalizing a square matrix extends to the situation that a 
matrix is not diagonalizable. For a given matrix A, we say that B is similar 
to A if there exists an invertible matrix P such that B = PAP. It is evident 
that B is similar to A is equivalent to that A is similar to B. If A is similar 
to B, we write A ~ B. It turns out similar matrices share many important 
properties which are listed below: 


Theorem 6.2.6. Let A, B be nx n matrices with A = P~'BP. Then 
we have 


i) det(A) = det(B) and det(A — AI) = det(B — AD); 


ii) rank(A) = rank(B); 
iii) trace(A) = trace(B), where trace(A) = >, aii. 


Proof. i) We have 


det(A) = det(P-1BP) 
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= det(B). 
Moreover, we have 
det(A — AI) =det(P-1BP — AI) 
= det(P-1(B — AI)P) 
= det(B — XI). 
ii) By Theorem 3.4.6, we have 
rank(A) < min{rank(P~'), rank(B), rank(P)} < rank(B). 
Since B = PAP™!, we also have 
rank(B) < min{rank(P~'), rank(A), rank(P)} < rank(A). 


Therefore, we have rank(A) = rank(B). 
iii) We first show that for every m x n matrix C and n x m matrix D, we 


have 


trace(C.D) = 5 (CD) 


Therefore, we have trace(A) = trace(P~'BP) = trace(PP-1B) = trace(B). 
o 


as Me eS 
Example 6.2.7. Let 4 = E A PS ki o , B = PAP = E l . 
Then we have i) det(A) = —2 = det(B), det(A — AI) = à? — 5A — 2 = 
det(B — AI); ii) rank(A) = rank(B) = 2; iii) trace(A) = 5 = trace(B). o 
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Estimates of eigenvalues 


Usually elementary row operations do not preserve eigenvalues. For exam- 


ple the permutation matrix E = has eigenvalues A = +1. But EE = I 


0 1 
1 0 
has eigenvalues A12 = 1 only. However, we still have certain estimates for the 
information of the eigenvalues without practically computing them. Consider 


the characteristic polynomial 
det(A — AI), 
and assume it can be factored into 
det(A — AT) = (1 — A)(A2 — A) ++ (An — A), 


where \;, i = 1, 2, --- ,n are eigenvalues of A. Putting A = 0, then we have 


det (A) = A1A2 ea Nene 


The next estimate on eigenvalues makes use of the fact that every square 
matrix A can be triangularized by an invertible matrix P such that 


P-'AP =D, 


where D is either a diagonal matrix with eigenvalues of A in the main diagonal, 
or D is the Jordan form (see Theorem 6.3.8) of A which is a triangular 
matrix with eigenvalues of A in the main diagonal. Note repeated eigenvalues 
all appear in the main diagonal of D. Then we have 


trace(D) = trace(P~'AP) = trace(PP- LA) = trace(A). 


In summary, we arrived at 


trace(A) = Ay + Ag +--+ + An, 


where A;, i = 1, 2, --- ,n are eigenvalues of A. 


Another estimate of eigenvalues makes use of the dominant matrix. Recall 
that every dominant matrix is invertible (See Example 2.3.8). However, we 
know that if Ao is an eigenvalue of A, A — Ag] is not invertible and hence not 
dominant. That is, there exists a row number k € {1, 2, --- ,n) such that Ao 
is in the following so-called Gershgorin disc: 


n 


lakk — Ap] < 5 azi]. 


i=1, i#k 


That is, 
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Theorem 6.2.8 (Gershgorin’s disc theorem). Let A be ann xn 
matrix. Then every eigenvalue Ao of A lies in at least one of the circles 
around the main diagonal entry: 


ào € MAD 


where 


n 


D; = {z E C : |z — akk| < Y lazi], i € {1, 2, --- n}. 


i=1, i#k 


Note that since A and AT share the same set of eigenvalues, Gershgorin’s 
disc theorem also applies with Gershgorin’s discs obtained according to the 
columns of A. 


1 -1 0 
Example 6.2.9. Consider A = | —1 2 0|. Then we have Di = {z EC: 
2 -1 4 


|z- 1| <1}, D2 = {z E€ C: |z — 2| < 1} and D3 = {z E€ C : |z — 4| < 3}. Then 
every eigenvalue of A is in the union Di U Də U Ds. 


Theorem 6.2.10 (Gershgorin’s second disc theorem). (op- 
tional) Let A be an n x n matrix. A subset S of the Gershgorin discs 


is called a disjoint group of discs if no disc in the group S intersects a 
disc which is not in S. If a disjoint group S contains r nonconcentric 
discs, then there are r eigenvalues in S. 


Proof. Let A(t) be the matrix obtained from A with the off diagonal elements 
multiplied by the variable t, where t € [0, 1]. Note A(0) is the diagonal matrix 
with the same diagonal of A, and A(1) = 4. 

Then A(0) has the n main diagonal entries of the eigenvalues and the n 
Gershgorin discs are themselves. As t ranges from 0 to 1, the Gershgorin discs 
will also change its radius with centers fixed at the main diagonals. Moreover, 
the eigenvalues will also change inside the Gershgorin discs. 

Since the roots of the characteristic polynomial det(A(t) — AZ) of A(t) 
change continuously with respect to t € [0, 1], the traces of the eigenvalues 
are continuous curves inside the Gershgorin discs. If a disjoint group S contains 
r nonconcentric discs, then there are r eigenvalues which never escape from 
the group S during the change of t from 0 to 1. O 


Cayley—Hamilton theorem (optional) 


Let p(x) = ay + az + aga? + +++ + amr be a polynomial and A an 
n x n matrix. If A has an eigenvalue A and eigenvector u € C”, how to obtain 
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eigenvalue and eigenvector of p(A) = aol + a A+ ao 42 + -:- + amA”? Note 
that we have 


Au = Au 
A2u = Mu 


Au = Xu. 
Then we have 
p(A)u = p(A)u. 


Namely, every eigenvector of A is an eigenvector of the polynomial p(A) of 
A, and if A is an eigenvalue of A then p(A) is an eigenvalue of p(A). It is 
then interesting to ask whether it is possible that for some polynomial p, p(A) 
has eigenvectors which are not that of A. It turns out the Cayley-Hamilton 
theorem claims that there exists a polynomial p such that p(A) = 0, which 
means every nonzero vector z is an eigenvector of p(A). Therefore, there exists 
eigenvectors of p(A) which are not that of A. 

Suppose that p is the characteristic polynomial of A, and A is diagonal- 
izable. Then we have n linearly independent eigenvectors and p(A) = 0. It 
follows that p(A) = 0 because the eigenvectors form a basis for R”. Moreover, 
it seems that p(A) = 0 holds true even if A is not diagonalizable. For example, 
p(A) = (A — 1)? is the characteristic polynomial of 


and J is not diagonalizable. But p(J) = (J — I)? = 0. 
Consider the adjoint of AI — A. Let B = adj(AI — A). By definition of 
adjoint, B can be represented as 


n—1 
BSS. NB, 
i=0 
We have 
(AT — A)adj(AT — A) = det(Al — ANI = p(A)JI, 


where p(A) = ao + a1A+ azA? +---+an A” is the characteristic polynomial of 
A. Then we have 


p(A)I =(a9 +a1A+ ag? + +++ + anA” 
=(AI — A)adj(AI — A) 
=(AI — A)B 


n-1 
=(AI — A) Y XB; 
1=0 
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n-1 
=)" Baa + Y X (Bi1 — AB) — ABo. 


i=1 


By comparing coefficient matrices of A, i = 0, 1, 2,--- ,n, we have 
Bn-1 al, 
Bi-1 — AB; =ajl, 1 < a < n— 1, 
—ABo =aol. 


Then multiplying both sides of the above equalities by A”, A"~!,...A°, respec- 
tively, we have a telescoping sum on the left hand side that 


A” Baa T A”! (Br-2 e AB,-1) + A”? (Bn-2 a AB,,-2) Fos 
+A(Bo — AB.) — ABy = 0, 


but the right hand side is p(A). Therefore, we have p(A) = 0. 


Theorem 6.2.11. Let A be an n x n matrix with characteristic poly- 
nomial p(A). Then we have 


p(A)=0. 


The Cayley-Hamilton theorem has many applications. The following ex- 
amples shed some light on how it can be applied. 


1 35 
Example 6.2.12. Let A = |O 1 0|. The characteristic polynomial is 
1 0 2 


p(A) = A — 4A? + 5A + 3. By the Cayley-Hamilton theorem we have 
A? — 4A? +5A +31 =0. 
which gives an expression of the highest term 4* with 
AS = 442 — 5A — 3I, 


and an expression for the inverse of A, 


4 = -i (444451). 


Exercise 6.2.13. 


1. Determine the diagonalizability of the following matrices: 


1 1 0 110 110 
A=|l0 1 1|,8=|o0 1 0|,c=|o 1 1 
00 0 001 00 1 
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2. Let P? = P. Is it true that A = 1 and A = 0 are both eigenvalues of P? 
Justify your answer. 


3. Let A, B be n x n matrices. Show that AB and BA have the same char- 
acteristic polynomial. Hint: Use block eliminations on the matrix fă a | to 
produce AJ — AB and AI — BA, respectively. 


4. Let A, B be similar n x n matrices with A = P~!BP. Show that if Ag is an 
eigenvalue of A with xo a corresponding eigenvector, then Ap is an eigenvalue 
of B with Pxo a corresponding eigenvector. 


J: 2s 20 0 0 0 

5.Le A= |2 2 2| and B= |0 0 0|. Show that A and B are similar. 
2 12.2 0 0 6 
1 2 3 1 2 0 

6. Let A= |0 1 2] anddB=|1 0 3]. Show that A and B are similar. 
3 1 0 2 3 1 


7. Let A, B be similar n x n matrices. Show that 

i) A and B have the same nullity, namely, dim N(A) = dim N (B); 

ii) A and B have the same set of eigenvalues; 

iii) If Ap is an eigenvalue of A, then dim N(A — Aol) = dim N(B — Aol); 
iv) A is invertible if and only if B is invertible; 

v) A is diagonalizable if and only if B is diagonalizable. 


8. Let A be an n x n matrix. Show that A and AT have the same set of 
eigenvalues. 


9. Let A, B be n x n matrices. Show that AB and BA have the same set of 
eigenvalues. 

10. Show that similarity is an equivalence relation on Mnn. Namely, 

i) (Reflectivity) For every A € Man, A ~ A; 

ii) (Symmetry) For every A, B € Mnn, A ~ B implies B ~ A; 

iii) (Transitivity) For every A, B, C € Mnn, A ~ B and B ~ C imply A ~ C. 


11. Find the Gershgorin’s discs for possible location of each of the eigenvalues 
of each of the following matrices. 
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1 2 3 
12. Let A= |3 1 2|. Which ones of the following discs in C may contain 
2 3 1 


eigenvalues of A? 


a) {2 €C: |z+4| < 10}; b) {2 €C: |z — 6| < 10}; 
c) {2 €C: |z| < 6}; d) {2 €C: |z+6] < 1}. 


13. Let A and B be n x n real matrices, and A is diagonalizable such 


that P~'AP is diagonal for some invertible matrix P. Let r be the value 
||P-1BP)||.0, where ||: ||. is the infinity norm of n x n matrices defined by 


n 

||Alloo = max X` Ja;zl. 
1<i<n £ 1 
J= 


Use Gershgorin’s theorem to show that for every eigenvalue Aayp of A + B, 
there exists an eigenvalue AA of A, such that 


ap —Aal Sr. 


14. Let A be an n x n real matrix. Show that if [Allo < 1, then I — A is 
invertible and 


rd 


15. Let A be an m x n matrix and x € R”. Show that 


[Arlloo < |] Allooll2'lloo- 


16. Use the Cayley-Hamilton theorem to compute 47! if 


1 
0 
0 


Corr 


0 
1 
I 


6.3 Applications to differential equations 


Recall that the power series expansion of e! converges uniformly on every 
closed interval [—£, L], L > 0. Namely 
O pa tr 
D n TE seoa 
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for every |t| < L. If we have definition of metric (or magnitude) for matrices 


¿E 
in Mnn, for instance, define the magnitude of A by ||A|| = (55 a) * | we 


can define e4, A € Man by the power series 


which converges uniformly for every A € Mnn with ||A| < L. 


Proposition 6.3.1. If AB € Mnn are such that AB = BA, then 


a 


Proof. If AB = BA, then by the binomial formula we have 


nl 


(A+ 5) = a E 


Moreover, we have 


n=0 k=0 
OO DO 1 
= — AJ BF 
DDE 
SS ee il 
=) gA dB 
j=0 k=0 
=e eP. 


The following matrix-valued function is welldefined in R: 
R >t —> e^ € Mnn, 


with 


14... = AeAt = e“ A. 
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For every c € R”, let x(t) = e4tc. We have 


d d Ar At 
rul! ) qe 8 Ae“'c = Ax(t) 
Namely, z(t) = e4tc is a solution of the differential equation 
d 
—x(t) = Ax(t 
alt) = An(t), 


with initial data z(0) = c. Conversely, if 4a(t) = Ax(t), with x(t) € R” we 
multiply both sides by e~“* to obtain 
d 
eM alt) =e 4t Ax(t), 


which leads to d 


dt 
Hence e” 4tx(t) is a constant vector c for all t € R which satisfies 


(e-4:a(t)) = 0. 


e-4te(t)|i-o = c. 


That is, c = z(0). 


Theorem 6.3.2. The solution of the system of linear differential equa- 
tions 


Remark 6.3.3. We call y(t) = e^t the standard fundamental solution 
matrix of the system of linear differential equations x'(t) = Az(t), which 
satisfies p(0) = I. Note that for every invertible matrix P € Mun, e4'P 


satisfies that E 
qe P) = A(e**P). 

Namely each column of e4*P is a solution of x'(t) = Az(t). Therefore, we call 

e4'P a fundamental solution matrix. 


Remark 6.3.4. By Proposition 6.3.1, we have 
eA(tts) = ee 


for all s, t e R and hence 


e AteAt = J, for allt € R. 
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Since e! is defined by an infinite series, it is in general not efficient to use 
a series to compute the exact solutions of differential equations. We need to 
find alternative methods for computing e%*. 


A is diagonalizable 


If A is diagonalizable, there exists an invertible matrix 


P=([x1:%9:+++: an] € Man 

such that 

Az 0 

A 
PAP == 

0 An 
We know that x;, i = 1, 2, --- ,n are eigenvectors of A associated with eigen- 
values A;, 1 = 1, 2, --- ,n, respectively. Then we have 


-1 
gat PRE 


PAP-L PRP, PA”P-! 
+ P +o = + 


zi Îi tă VOR 21 a i 
Pg pa! 
ert 0 
ett 
=P pl 
0 eAnt 


Then the solution of Lx(t) = Az(t) with initial value z(0) = c can be written 


a(t) =Pe*tp-le 


=gr1e Mc, + e tea + + Ene™”tCn, 


E 
C2 
where | . | = Pte. In summary, we have 


Cn 
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Theorem 6.3.5. If A is diagonalizable, there exists P = [zu : 
e 1 an] € Mnn such that 


Ay 


1 Az 
P-AP=A= 


0 An 


The solution of the system of linear differential equations 


Žale) = Ax(t 


z(0) = c, 


ae) = Pe Pe = rie tci + gzet ca + --- + Ene” tcn, 
T 
where [c C2 °° | = Ple. 


Example 6.3.6. Consider the system of linear equations 


We have 


oa 


The characteristic polynomial of A = x 1l îs (A — 1)(å + 1) — 3. The 
eigenvalues are A = 2, A2 = —2. Solving system (A — àı)x = 0 we obtain 


an eigenvector zi = (1, 1). Similarly (A — X2)z = 0 leads to an eigenvector 
£2 = (1, —3). Therefore, the solution is 


1 
a(t) = ae Me + e to = h a+ ll ete, 


where (c1, c2) is determined by initial values of x(t). For instance, if z(0) = 
(1, 1), then we have 


eee E BB 


Remark 6.3.7. If there is a complex eigenvalue A = a + bi, we can replace 


the complex valued solutions e(¢*°)' with the following real-valued ones, 
at+bti _ ¿at—bti 


e e 
2 


eattbti + eat—bti 


= -e** cos bt, and 
2 2 


at a; 
= =e“ sin bt. 
2 e sin 
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A is not diagonalizable (optional) 


If A is not diagonalizable, then there exists at least one eigenvalue with 
geometrical multiplicity less than its algebraic multiplicity. Then A is similar 
to a matrix J which is called the Jordan form. (See Chapter 8 for details.) 


Theorem 6.3.8. If A € Mnn has s < n linearly independent 
eigenvectors, it is similar to a matrix J which has s Jordan blocks 
Ji, Ja, ++- , Js, where 


Jı 
J2 


j = 1,2,-::,s, and the size n; of J; is the algebraic multiplicity of 
the eigenvalue A; with nj + n2 +- +ns =n. 


Let A = PJP"!, where J is the Jordan form of A. We have 


e^t =PeJtp-l 


Notice that for every Jordan block J; it can be written as J; = A;1+(J;—A;1) 
with 
Ài 0 1 


Moreover, the line of 1's in N; = (J; — A;I) moves parallel with the main 

diagonal to the upper right corner when the power of NE increases from 

k=1,2,---,n;, and N” = 0. Notice that identity matrix multiplication 
is commutative with any matrix. We have 

t? ca ae 

elit —erAit+Nit — ¿At (1+0N, ie AS Nil 


27 aaa) 
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E LV... pai 
z T-I 
CE o E 
(nj=2)1 

edit y 

1 
: t 
0 1 


Notice by Remark 6.3.3, we know that e4*P is a fundamental solution matrix. 
That is, 
e“ P = Pett 


is a fundamental solution matrix. Ideally once we have the Jordan form J, 
we also have matrix P to obtain the fundamental solution matrix. However, 
in the practice of solving linear system of differential equations, the matrix P 
for producing the Jordan form J = P-LAP is not a priori known while we 
know its existence when we have found that A is non-diagonalizable. In the 
following, we assume that J is known, but P is unknown, and we try to figure 
how to obtain P. 

Let {rj1, 732, *** ,Tjn; } be the columns of P corresponding to the Jordan 
block eit in Pet. Then we have 


Aleje Tjatt Tin] = Patas ld 
That is, 


Arji =Aj"51 


Arjo =r; T Ajrj2 


Arj3 =Tj2 T AjTj3 


which is equivalent to 


(A — Aj Iri = 0 (A = AjI )rji =0 
(A — Aj Dr = rj (A — AT rj =0 
(A = AjDrj3 = 732 > (A — AD rj3 =0 


(A — Aj ring = 1 5(n;-1) (A — AjI)" Tin, = 0. 


Then (rj1, 7j2, ::: ,Tjn,j are all nonzero vectors; otherwise, the above 
equalities lead to rj = 0 contradicting the assumption that rj; is an eigen- 
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vector. Moreover, we have 
(A — AI) Tin; =0 
Tj(nj=1) = A a AyD) rin, 
P3(nj—2) = (A — Ai Drm = (A — AjD rin, 
Tj(nj—3) = A — Aj rin; 2) = (A S Aj TPT jn; 
rja = (A — Aris = (A ALU rm, 
rj = (A = Aj Dra = (A = AjI)" T jng. 


Now the algorithm for P is clear: First we solve (A — AjI)"rjn, = 0 for 


a nonzero vector Tjn,- Then obtain the set {rj1, rjo, ** ,Tjn;} backward in 
order by multiplying (A—A¿1)*, s = 1,2, +++ ,nj—lonrjn,, to obtain rjn,_,, 
Tjnj-a E P51- 


Note that (A— A;1)"irjn, = 0 has at least one nonzero solution rj, since 
det((A—A¿1)) = 0. One can show that [r;1, 732, +*+ ,Tjn, } determined by the 
above equalities is linearly independent (see exercise). Then by Lemma 6.2.3, 
the matrix P so obtained is invertible. 

The columns of Pe’! corresponding to the block for A; are then 


ni-l ini 2 
Aj Aj g PA A i . 
ernie ra + ja ee j e^t (ar + (n — 2? F etrn) : 
Example 6.3.9. Find the general solution of i = Ax with A = 
3 1 0 


—4 —1 0]. Then det(A—AI) = —(A+2)(A—1)? which has eigenvalues 
4 -8 -2 
A = —2 and A2 = 1 with algebraic multiplicity 2. 
For A = —2, solving (A — A11)x = 0 we obtain an eigenvector 


0 
rı = 0 
1 


For Ag = 1, we solve (A — A21)2z = 0 to obtain 


-11 —3 
r2 = | —7|, rao= |—6 
0 20 
We note that 
15 0 
ra = (A — doro = |—30| #0, but(A— X2I)r2o = |0 
100 0 
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Therefore e?2r3y can become an independent column of the fundamental so- 
lution matrix. But according our algorithm, we use rəs instead, which can 
produce another independent vector r21. Then the matrix P is 


0 l5et (11+ 15t)et 
let, 2 etry : et (tra + ro) | 7 0 -30e* (—7 — 30t)e" 
e7%  100et 100te® 
Exercise 6.3.10. 


1. Solve the following systems of linear equations. 


1) 
tı =%1 + T2, 
£2 =X1 — T2. 


2) 
tı SI + Za, 
T2 =11 T T2. 
3) 
LT] = — Tı + T2, 
T2 =T2 — La, 
T3 =T1 — 4x3. 
4) 
tı =2ri + 222, 
t2 =— T2 + 23, 
da =273. 


2. Use change of variables to transform the following different equations into 
systems of differential equations, and then solve them. 


i)2"—a—a=0. 

ii) 2" — x” — 22’ =0. 
iii) 7” + 3a’ + 2a = 0. 
iv) x” + 3a" + 22’ = 0. 


3. Solve the following system of differential equations 


E =11 + Za, 


La = — £1 +29. 
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4. Solve the following system of differential equations 
În =0x1 — Bre, 
t2 = Bari + a22, 
where a, 8 ER with 6 #0. 
5. Solve the following system of differential equations 
tı =%1 +22, 


do = T2 + T3, 


£3 = Ia. 


6. Let v # 0 be a nonzero vector in C” and B € Mn. If there exists k € N 
such that B*v = 0 but Bv #0 for 1 < j < k, then {v, Bv, B?v, --- , B*~!v} 
is linearly independent. 


6.4 Symmetric matrices and quadratic forms 


Example 6.4.1. Let p(x) = 2x? + 2x3 + 21,12 be a real-valued polynomial 
of x = (x1, £2) € R?. Then p(x) can be rewritten as 


T2 


p(z) = zT Az = [z1 29] p | al ) 


where A = f ; 
term 1112, we may find a change of variables x = Py with P invertible such 
that 


is a symmetric real matrix. In order to remove the mixed 


p(z) = (Py) APy = y” (P"AP)y, 
and PT AP = diag(àı, A2) is diagonal. Namely 


p(Py) = y diag(A1, A2)y = Aryi + A2y3- 
Z 


2 
v2 


2 


Indeed, if P = 


| „then the change of variables x = Py leads to 


ls 


plo) = plPy) = (ETA) = bn wel la 4] E 


That is, 
p(Py) = y diag(3, 1)y = 3y? + y3. 
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From the above example, it is natural to ask whether or not for every real 
quadratic polynomial in x € R” there exists a change of variable x = Py such 
that p(x) = xT Az can be rewritten 


p(Py) = y diag(A1, A2, ae ,An)Y, 


with P an invertible matrix and PY AP = diag(A1, A2, ++- , Àn). That is, we 
find P such that PY AP is diagonal. Recall that A is diagonalizable if there is 
an invertible matrix P such that P7* AP is diagonal and that PT = P-Lif P 
is an orthogonal matrix. 


Definition 6.4.2. A matrix A is called orthogonally diagonalizable if 


there is an orthogonal matrix P and a diagonal matrix D for which 
JEE AUR = ÍD), 


Now we are interested in what kind of matrices are orthogonally diagonal- 
izable. First we have 


Theorem 6.4.3. If A is orthogonally diagonalizable, then it is sym- 


metric. 


Proof: Since A is orthogonally diagonalizable, there is an orthogonal 
matrix P and a diagonal matrix D for which PTAP = D. Then we have 
A = (PT)-1DP-! = PDP” since PT =P Therefore, 


AT = (PDP)! = PDP! = A. 


A is symmetric. O 
Next we are curious whether every real symmetric matrix is orthogonally 
diagonalizable. It turns out to be true. But we need some preparations. 


Theorem 6.4.4. If A is a real symmetric matrix, it has n real eigen- 


values. That is, every eigenvalue of A is real. 


Proof: Suppose that Ax = Ax. Then we have Ax = Ax which leads to 
Az = XT. Taking transposes on both sides we obtain 


TTA = XT" > T" Ax = XT' z. 
Multiplying 77 on both sides of Ax = Ax, we obtain that 
ZI Ax = XE" x. 


Then we have = 
ZI Ax = AI = rE" x, 
leading to (A — A)z7 = 0. Hence A = A. A is real. O 
We leave it as an exercise to show that 
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Theorem 6.4.5. If A is a real symmetric matrix, then every eigen- 
vectors of A are perpendicular. That is, eigenvectors corresponding to 
different eigenvalues are orthogonal. 


Theorem 6.4.6. A real matrix A is orthogonally diagonalizable if and 
only if A is symmetric. 


Proof: “= >”: Done with Theorem 6.4.3. 

“==”: We use mathematical induction on the size of the n x n matrix 
A. For n = 1, A is a single element matrix, say A = [a]. Let Q = [1]. Then 
Q7 AQ = [117 [a][1] = [a] is diagonal and Q = [1] is an orthogonal matrix. 

Now we consider the general case n and for an inductive assumption, we 
suppose that every (n — 1) x (n — 1) matrix is orthogonally diagonalizable. 

Let A; be a real eigenvalue of A and v; a real unit eigenvector of A. We can 
extend vı into an orthonormal basis 6 of R”. Group the basis into a matrix 
U = [vı : v2 : +++: Up]. Then U is an orthogonal matrix with UT =U-l and 
U-LAU is symmetric. Moreover, the first column of U-LAU is 


U-LAUe = UT! Av; = MU” Lu = A1€1, 


where e; is the first vector of the standard basis (e1,e2,::: ,€n). Therefore, 
we have 


DE: N: 
ta = | o 


where C is an (n—1)x (n—1) symmetric real matrix. By inductive assumption, 
C is orthogonally diagonalizable. That is, there exists an (n — 1) x (n — 1) 
orthogonal matrix P such that 


PT CP = diag(Ao, Az, ::: , An): 


Let V = E j . Then V is orthogonal and 


0 P 
ena 1 aTa a o 
rotar lo le ello a 


= diag(A1, Az, Az, pea An): 
Let Q = UV. Then Q is an orthogonal matrix such that QT AQ is diagonal. o 


An immediate consequence of Theorem 6.4.6 is the following spectral 
decomposition theorem: 
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Theorem 6.4.7. If A is an n x n real symmetric matrix A, then A 
has an orthogonal eigenvector matrix Q = [q1 : q2 : ++: : qn] such that 


A= ONO? = Aig + Aoga F- F onde, 


where A = diag(A1, A2, A3, ::* , An) and the main diagonals are eigen- 
values of A. 


2 —2 0 
Example 6.4.8. Let A = |-2 1 —2|. Find a spectral decomposition of 
0 -2 0 


A. 

Solution: Consider det(A—AI) = 0. We have det(A—AI) = —(A—1)(A—4) (4 
2) = 0 and the eigenvalues A = 1, Ap = 4, Ag = —2. Solving (A — A1D)x = 0 
we have a unit eigenvector, 


2 
= d 
qı = -3 . 
z] 
3 
Similarly, solving (A—A2I)x = 0 and (A—A31)x = 0, we have unit eigenvectors 
corresponding to eigenvalues A2 = 4 and A3 = —2, respectively: 
2 1 
3 3 
e= |-5|:8= |5 
1 2 
3 3 


Then we have an orthogonal matrix, 


im Dad 
(== -2 2|, 
3f 2 12 
such that 
10 0 
QAQ=|0 4 0 
0 0 -2 


Moreover, we have 


A =QAQT = \quat + Aqua? + A3q3q7 


4 24] ¡[4-4 2] _o[1 22 
=i] 2 1 ol 4 A a e! 
-4 -2 Ali Sl. 4 2 44 
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Example 6.4.9. Let f(x) = x? + 223 + 323 — 40,03 — 40203 with £z = 
(11, 2, 13) € RÌ. Find an orthogonal transformation x = Py so that f(Py) 
has no mixed terms of y. 

Solution: f(x) = xT Ax with 


| —2 0 
A=|-2 2 =D 
0 -2 3 


The characteristic polynomial of A is det(A — AI) = —(A — 2)(A + DA — 5) 
with eigenvalues A = 2, A2 = 5, Ag = —1. Solving (A — AI) = 0 for each of 
the eigenvalues, we obtain the following unit eigenvectors: 


2 1 2 
3 3 3 
Pı = -4 ,P2 = -2 »P3 = 2 
22 2 1 
3 3 3 
Then we have the orthogonal matrix P given by 
1 Zo TEO 
P= lp :p2:p3]= 3 A 2 
—2 21 
Then we have 
2 0 0 
FPy)=y P"APy =y" [0 5 Of y= 2y] + 7y3 — y3- 
0 0 -1 


Exercise 6.4.10. 


1. Find a real symmetric matrix A such that the following polynomials can 
be represented as a7 Ar, £x = (21, 12,-**,tn) € R”, and find a change of 
variables x = Py with y = (y1, Yya,*** ,Yn) E R” such that f(Py) contains no 
mixed terms yiyj, 1 + j from y. 


1) f(x) =x? +x} + grizo. 

2) f(x) = 27 + 22 + 2x172. 

3) f(z) =x? + 12 + x} aura + ruta + 2223. 

4) f(v) =a} +23 + 23 + Qari + 22103 + 20223. 
5) f(x) =x? + z2 + zuza. 

6) f(x) = —a2 — z2 + zuza + 21223. 

7) f(z) =a} +03 -— r} + 23. 

8) f@) =e? + DS tits 
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2. Let p(x) = zT Az be a quadratic polynomial of x € R? with 


2 2 —2 
A= 2 5 —4 
—2 —4 5 


Find a change of variables x = Py with y = (y1, y2, y3) € RS such that 
p(Py) = Ayi + Aya + 33, 
for some A; E€ R, i = 1, 2, 3. 


3. Find a spectral decomposition of each of the following matrices. 


2 2 2 0 0 2 
A=|2 2 2|, B=|0 2 0 
2 2 2 20 0 


4. Find a spectral decomposition of each of the following matrices. 


0 0 2 2 00 2 2 
0 2 2 0 02 2 2 
A=|o 2 0o op P=l2 220 
2000 22.00 
5. Let u = [1 l ass 1] be a 1 xn matrix and A = I+uTu. Find a spectral 


decomposition of A. 
6. Prove Theorem 6.4.5. 


7. Let A be a real symmetric matrix with rank k. Show that A can be written 
as the sum of k symmetric matrices with rank 1. 


8. Let A be a 2 x 2 real symmetric matrix with eigenvalues A = +1. Show 
that A is an orthogonal matrix. 


9. Let A be an n x n skew-symmetric real matrix. Show that every nonzero 
eigenvalue of A is purely imaginary. 


10. Let A and B be real symmetric matrices. Show that A is similar to B if 
and only if A and B have the same set of eigenvalues. 


11. Let A be an invertible symmetric real matrix and B a skew-symmetric 
real matrix. Show that if AB = BA, then 


i) A+ Band A— B are invertible; 
ii) (4A+B)*(4— B) and (A— B)~!(A + B) are orthogonal matrices. 
12. (Schur factorization). Let A be an n x n real matrix with n real eigen- 


values. Show that there exists an orthogonal matrix Q such that QTAQ is 
upper triangular. (Hint: Use mathematical induction.) 
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6.5 Positive definite matrices 


We have confirmed that every real symmetric matrix is orthogonally diag- 
onalizable. Therefore, for every quadratic polynomial p(x) = xT Ax with A a 
real symmetric matrix, there exists a change of variables x = Py where P is 
an orthogonal matrix such that 


p(Py) = y" PT APy =y*diag(A1, Az, Az, +++, An)¥ = Ayi HAY +: + An Ys 


where A1, A2, A3, °°: „An are eigenvalues of A. It follows that if every eigen- 
value of A is positive, then for every x + 0, y = PTz # 0, hence 


p(x) x! Ax p(Py) Ay? + A2Y3 +. + Any? > 0. 


Conversely, if 
p(x) = zT Ax > 0, 


for every x + 0, let A = PYdiag(A1, Az, Az, + „An)P with P an orthogonal 
matrix. Then 


p(x) = p(Py) = Ary? + A2ya + +++ + Anya > 0 


for every y # 0. It follows that every eigenvalue of A is positive. 


Definition 6.5.1. A real symmetric matrix A is said to be positive 


definite if 27 Ax > 0 for every x + 0. 


We have shown that 


Theorem 6.5.2. Let A be a real symmetric matrix A. Then A is 
positive definite if and only if every eigenvalue of A is positive. 


However, it can be computationally inefficient to work out all eigenvalues 
in order to confirm positive definiteness of a matrix. We have the following 
equivalent conditions: 


Theorem 6.5.3. Let A be a real symmetric n x n matrix A. Then the 
following are equivalent. 


i) A is positive definite. 


ii) Every eigenvalue of A is positive. 


iii) A = BTB for a matrix B with independent columns. 


iv) All pivots obtained without row exchange or scalar multiplication 
of A are positive. 


v) All n upper left determinants are positive. 
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Proof: 1) We have shown i) ii) with Theorem 6.5.2. 
2) Next we show îi) iii). “ii)>iii)” Let A = QAQT, where A = 
diag(A1, A2, Az, °°: ;An) and Q is an orthogonal matrix. Let A? = 


diag(A?, AZ, AŽ, -< AZ) and B = AQT. Then we have A = BTB and 
columns of B are linearly independent since det(B) > 0. 

“ii)<iii)” For every x + 0, then Bx + 0 since columns of B are linearly 
independent. Then we have 


a? Ax = 27 BT Ba = (Br)! Ba > 0. 


That is, A is positive definite. By Theorem 6.5.2, every eigenvalue of A is 
positive. 

3) Then we show iii)<iv). 

“iji)=iv)” 

Let B = QR be the QR decomposition of B with Q an orthogonal matrix 
and R an upper triangular matrix. We can manage to make the main diag- 
onals of R positive by multiplying the corresponding columns of Q by (+1) 
if necessary. Then we have A = BTB = RTQTQR = RTR. That is, A can 
be written as a product of a lower triangular matrix and an upper triangu- 
lar matrix with the main diagonal positive. (We call such a decomposition 
Cholesky decomposition.) Therefore, all pivots of A are positive. 

“iii)<iv)” If all pivots of A are positive, then by LU decomposition (see 
Section 2.5), there exist an invertible matrix L and a diagonal matrix D with 
the main diagonal positive such that 


A= LDE?, 
Let B = LT Dz, We have 

A= B"B, 
and columns of B are linearly independent. 

4) Lastly we show v)@i). “i)=v)”: Denote by Az, k = 1, 2,---,n the 
upper left submatrices of A. Since A is positive definite, for every x + 0 we 
have 17 Ax > 0. Then for every £ = (x1, £2, -:: , Lp, 0, --- ,0) 40 we have 

Tı 
T2 
rT Ar = [vy : zor] A |. | >0. 
Tk 
That is, every Az, k = 1, 2, --- ‚n is positive definite. Then by equivalence of 


i) and ii) every eigenvalue of Az is positive. Hence det(Az) > 0. 
“i)<v)”: Since det(A1) > 0, we have the first pivot pı = a11 > 0. Then 
there exists a lower triangular matrix E, with 1’s in the main diagonal such 
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that 


EA = 


O x Cn 
Letting Ej2 be the 2 x 2 upper left submatrix of E1, we have 
EA = |P *]. 
1242 | O p 


Since det (Fo 42) = det (42) > 0, we have the second pivot pa > 0. Then there 
exists a lower triangular matrix E> with 1’s in the main diagonal such that 


P1 * 
0 pa 
O 0 pm. + 
fa 0 0 x 
0 0 x +... Ann 


Letting E22 be the 3 x 3 upper left submatrix of Ez, we have 


Pi * * 
Bi A3 = 0 P2 * 


0 0 pz 
Since det(£22A3) = det(A3) > 0, we have the second pivot p3 > 0. By the 
same token we can show every pivots px, k = 1, 2, --- ,n are positive. 
O 
1 -1 1 
Example 6.5.4. Let A = |—1 2 0 
1 o 1 


To check the positive definiteness of A, if we solve det(A— AT) = 0 we have 
A3 — 4)? +5\—3 = 0, the roots of which are not trivial to obtain. However, we 
can compute the left upper determinants: det(A;) = 1 > 0, det(42) = 1 > 0, 
det(A3) = det(A) = 3 > 0. Then by Theorem 6.5.3 A is positive definite. 


Exercise 6.5.5. 


0 0 0 10 


1. Let A = A E = i . Find all possible values of t € R such 
v10 0 0 -3 


that A + tI is positive definite. 
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0 0 0 2 
0 —1 0 0 . i 
2. Let A = 0 0 -2 . Find all possible values of t € R such that 


2 0 0 —3 
A + tI is positive definite. 


3. Show that if A and B are positive definite, so is A+ B. 


4. Let A be a symmetric real matrix. Show that if A is positive definite, then 
A”! is also positive definite. 


5. Show that if A and B are similar and A is positive definite, then B is also 
positive definite. 


6. Let A and B be n x n positive definite matrices. Is it true AB and BA are 
positive definite? Justify your answer. (Hint: Use 2 x 2 matrices to construct 
examples.) 


7. Show that A is positive definite if and only if for every n € N, there exists 
a positive definite matrix B such that A = B”. 


8. Let A be an n x n real symmetric matrix with eigenvalues A, < A2 < +++ < 
An. Show that for every z € R” we have 


Mala < rT Ax < Anz’ z. 


9. Show that if A is positive definite, then the function f : R” — R defined 
by 

f(x) = zT Ax — 27 
achieves minimum at x with 24x = b. (Hint: Compute f(y) — f(x) for every 
y ER”.) 


10. Let A be a real symmetric n x n matrix A. We call A positive semidef- 
inite if xT Ax > 0 for every x + 0. Show that every eigenvalue of A is non- 
negative. 


11. Let A be an nx n matrix. Show that A is a skew-symmetric matrix if and 
only if 7 Ax = 0 for every x € R”. 


12. Let A be an n x n symmetric matrix. Show that if 27 Ar = 0 for every 
x E€ R”, then A =0. 


13. Let A be an n x n symmetric real matrix. Show that there exists c > 0 
such that 
ja? Ax] < cx? x, 


for every x € R”. 


14. Let A be an n x n symmetric real matrix. Show that there exists to € R 
such that for every t > to, A + tI is positive definite. 
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15. Let A be an n x n symmetric real matrix. Show that there exists to > 0 
such that for every 0 < t < tg, tA + I is positive definite. 


16. Let A be an n x n symmetric real matrix. Show that if det(4) < 0, then 
there exists x € R” such that 17 Ax < 0. 


17. Let A be an n x n symmetric real matrix. Show that A is positive definite 
if and only if for every k € N, there exists a positive definite matrix such that 
A= Be, 


18. Let A and B be nx n symmetric real matrices and B be positive definite. 
Show that AB may not be symmetric, but all of the eigenvalues of AB are 
real. 

19. Let A be an n x n real invertible matrix. 

i) Show that there exists a positive definite matrix B such that ATA = B?; 
ii) Show that AB”! is an orthogonal matrix; 

iii) Show that every real invertible matrix A = QB, where Q is orthogonal 
and B is positive definite. 


20. Show that every real invertible matrix A = BQ, where B is positive 
definite and Q is orthogonal. 


Taylor & Francis 
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7.1 Singular value decomposition 


We have learned that every real symmetric square matrix is orthogonally 
diagonalizable and enjoy the spectral decomposition according to the eigenval- 
ues and eigenvectors. For general nonsymmetric or nonsquare matrices, there 
is no spectral decomposition anymore. However, making use of the real sym- 
metric matrix A’ A, we may develop a generalized spectral theorem in this 
section. 

We begin with the following theorem which reveals important properties 
shared by A and ATA. 


Theorem 7.1.1. Let A be an m x n matrix. Then we have 


i) A and ATA have the same nullspace. 


ii) A and ATA have the same row space. 
iii) A and ATA have the same rank. 


Proof: i) Let Arvo = 0. Then we have A? Avg = 0. Hence N(A) C 
N(AT A). Conversely if AT Azo = 0, we have 


Azo L every column of A (or every row of AT); 


Azo itself is a linear combination of columns of A (or every row of AT). 


Azo must be orthogonal to itself. Hence Axo = 0. That is, N(A) > N(AT A). 
Therefore we have N(A) = N(A7 A). 

ii) By Theorem 4.1.4, we have R” = N(A) @ R(A) = N(47 A) 9 R(AT A) 
where the direct sums are orthogonal. By i) we have N(A) = N(47 A). There- 
fore, we have R(AT A) = R(A). 


155 
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iii) By ii), since R(A) = R(AT A), A and ATA have the same rank. O 


We know that ATA is a real symmetric matrix; therefore, it has orthogonal 
decomposition 


ATA=VDV?, 
where D = diag[A1, A2, ::: , An} is a diagonal matrix whose main diagonal 
entries are eigenvalues of AT A, and V = [uy : V2 : +++: Un] is an orthogonal 
matrix the columns of which are the eigenvectors of A’ A corresponding to 
the eigenvalues 41, A2, +++ , An, respectively. 
Since for every x € R”, and for y = (y1, Y2, -** ,Yn) = VTz, we have 


xT AT Ax = £T VDV” z = `y? + days A A >0. 


Then all eigenvalues A1, A2, +++ , An are nonnegative; otherwise we can choose 
y such that Ay? + Ay? +- + Any? <0. 


Definition 7.1.2. Let A be an mx n matrix. Then every eigenvalues 
Ar, Az, +++ , Àn of ATA are nonnegative. We call the numbers 


Gi = VAL 02 = VAn on = Y Me 


the singular values of A. 


Assume that the rank of A is k. Then by Theorem 7.1.1, the rank of ATA 
is also k. It follows that D also has rank k since D is similar to 47 A. That is, 


D = diag{A1, Aa, cadita Nk; 0, ae ,0}. 


Notice that 
Avi - Av; = v? AT Av; = vj: Ai = Ai ` Uj. 


We have 
0 ifizj, 
Av; - Av; = Ai Uy = «ed a 
A ifi=j. 
Therefore, [4v1, Ava, --- , Avg} is an orthogonal set of vectors in the column 
space of A. Normalization of (Av1, Ava, ::: , Avg} leads to 
Avı Avı Avo Avo Av, Av; 
ul = U2 = 


= = —=, = = pe Uk IT SR 
lall Va lol Ve e Mod VA 


That is, Av; = viui, i = 1, 2, --- ,k. Since by Theorem 7.1.1, A and 
AT A have the same null space, we have that AT Av; =0,1=k+1,k+2,---,n 
and hence Av; = 0 for i = k + 1, k +2, --- ,n. Therefore, we have 


AV =Alvy : V2: Uk 3 pin] 
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VAL 0 
Vă 


un : ua 3": 4g:0:---:0) VA 


0 


o 


Noticing that V is orthogonal, we have 


VAL 0 


A =[u up: 0 0] VĂ vT 
0 
0 0 
= Vu! + y Auzu? pe VAkUkUE. 
If we extend the orthogonal set (ua : uz : ::: : ug) into a full orthonormal 
basis {uy : Ug : ++ : Uk : Uk+1 i +++: Um} for R™ in an arbitrary fashion, we 


have proved the singular value decomposition theorem 


Theorem 7.1.3. Let A be an m x n real matrix with rank k. Then 
there exist an orthogonal m x m matrix U, an m x n matrix 


J 0 
apa 


where Y = diag{ VA, VA2, °°: ; VA), and Az, A2, ++- , Ag are posi- 
tive eigenvalues of AT A, and an n x n orthogonal matrix V = [v1 : va : 
: Un], such that 


A=U5VI = y Auv? + y Auv? +-+ 7 AURUL, 


where the columns of V lu 2 wa i Un] are eigen- 
vectors of ATA corresponding to the n nonnegative eigenvalues 
Az, Aa, +++ Ag, 0, +++ ,0, and U = [uy : ug : ++: Uk i up tt Um] 
satisfies 

Av; 


E, 
ee 


The decomposition in Theorem 7.1.3 is called the full singular value 
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decomposition. According to the partition of X, we can have the following 
reduced singular value decomposition: 


Theorem 7.1.4. Let A be an m x n real matrix with rank k. Then 
there exist an mx k matrix U with orthonormal columns, an kx k diago- 
nal matrix Y = diag{ VAL, VAz, -:: , VAr}, where Ax, Ao, +++ , Ay are 
positive eigenvalues of AT A, and an n x k matrix y= [vi : Vg iret vk] 
with orthonormal columns, such that 


A= = Vu? + y Auzul ++ AURUL, 


where the columns of Y = [ul : Ug : +++ : Ug] are eigenvectors of 
AT A corresponding to the k positive eigenvalues A, A2, -++ , Ax, and 


U = [u1 : u2 : +++: uk] satisfies 


Solution: We have 


which has eigenvalues A, = 1, A2 = 2 with corresponding eigenvectors 


a nt] 


These unit eigenvectors form the columns of V: 


The singular values of A are a, = VA, = 1, 02 = yAz2 v2. Then we have 


N 


$ 1 0 
-B-| 4) 
0 0 
To have U = [ua : u2 : us], we firstly compute 


Au 
Vi 


0 Avo 1 
= 1 ; ua == == 
0 


ui = 


Vă V2 


Singular value decomposition 
To extend {u1, u2} into an orthonormal basis for R? we need a unit vector 
uz = (x, y, 2) such that 


ul uz =0, 
T 


uz uz = 0, 
which leads to y = 0 and x + z = 0. Hence we can choose 


159 


1 1 
0 Va Z| |1 0 1 og? 
A=US5VI =f 0 0| lo y2 
Oa, ates EE 
va V2 
=1- uv! + y2u2u3 


0 
1| [1 o]+ v2 
0 


A=USV? = 


ao: od? 
MIOS v2| |0 1 
OR 

= 1- uw? + V2uzul 


0 
1| [1 0+ v2 
0 


O 
Remark 7.1.6. An alternate method to compute the orthogonal matrix U 
on the observation that 


for a singular value decomposition of the m x n matrix A = UYEV” is based 


ATA = VII SVI AAT = USI O", 


where XTE and YN” share the same set of positive diagonals A, Ao, 


Lites, Aks 
k = rank(A), with possibly a different number of zeros. Namely, U can be 


obtained by solving for the set of unit orthogonal eigenvectors of AAT. 


O 
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The Rayleigh quotient 


Let A be an n x n real symmetric matrix. The function r : R” \ {0} > R 


defined by 
TA 
E x* Ax 


alg 

is called the Rayleigh quotient. We are interested in when r assumes max- 
imum and minimum. Noticing that 27x = ||z||?, we need only consider the 
extreme values on the unit sphere in R”. Since A is real symmetric, all eigen- 
values are real with A; > A2 > :::A„ and there exists an orthogonal matrix 
P = |v; : v2 : ++ : 0] such that 


PTAP= diag[A1,A2,-** An). 
Let x = Py where ||x|| = 1. We have ||y|| = 1 and 
y PT APy 
yT PT Py 
yi Pl APy 
© yTy 
LAY? + Aaya +++ Ang 
tat tM 
=Aryt + Aya + +++ Ana 
SAYI + Arya tf Ana 
=41, 


re) = r(Py) = 


where the maximum å; is achieved at y = e1 = (1, 0, --- , 0) by r(Py). That 
is, the maximum A1 is achieved by r(x) at x = Pe, = v1, which is the first 
column of P and is the eigenvector of A corresponding to A. Similarly, 


yT PT APy 

yT PT Py 
=\yi + A2Y3 +-+ Any 
>Any? + Anya a poti zi My? 
=n, 


r(x) = r(Py) = 


where the minimum A, is achieved at y = en = (0, 0,--- , 0, 1) by r(Py). 
That is, the minimum A, is achieved by r(x) at x = Pe, = vn, which is the 
last column of P and is the eigenvector of A corresponding to An. 

We can also check that if x equals the unit eigenvectors vı and Un, corre- 
sponding to A; and A,, we have 
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That is, the Rayleigh quotient assumes maximum and minimum at the direc- 
tion of the eigenvectors corresponding to the maximum and minimum eigen- 
values. 

The next question is, are there any more? Namely, can the Rayleigh quo- 
tient assumes maximum and minimum at non-eigenvector directions? The 
answer is no. Note that for x = (£1, £2, ::: , £j, *** n), we have 

Or (ef Ax+a™Ae;) 2? Ax(efx+ate;) 
Ox; ala (aT x)? 
2(Ax), x7 Ax(2z;) 


aTe (aT x)? 


If r(x) assumes extreme values at x9, we have L (xo) = 0. That is, 


2(Az) xTAzr(2x) _ 


CA pad 
which is equivalent to 
Axo = r(to)to. 


That is, the Rayleigh quotient assumes extreme values only at the directions 
of eigenvectors. 


Example 7.1.7. Find the maximum and minimum of f(x) = x? + 223 + 
61,13 + z2 on the unit sphere ||x|| = 1. 


Solution: Since f(x) = 22, we can use the results about the Rayleigh quo- 
tient to find the maximum and minimum of f. We first write f in quadratic 
form f(x) = 27 Ax with 


1 0 3 
A=|0 2 0 
3 0 1 


The characteristic polynomial is (2 — A)(1 — A)? — 9(2 — A). The eigenvalues 
are A = 4, Aa = 2 and Az = -2. 
e The unit eigenvector corresponding to A; = 4 is 


ore 


f achieves maximum value A, = 4 at vı and achieves minimum value Ag = —2 
at v2. D 
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Exercise 7.1.8. 


1. Find a singular value decomposition of A = E : o 
1 0 
2. Find a singular value decomposition of A = |0 1 
1 0 
3. Let 
2 2 0 
A=|-2 1 —2 
0 -2 0 


Find the vectors £max, min ÎN 
{zr €R3: a7 x= 1}, 
where f(x) = x7 Ax assumes maximum and minimum, respectively. 
4. Let 
2° 2. 2 
A= 2.2. 2 
2, 2.2 
Find the vectors £max, min ÎN 
{zr ER? : a7 x= 1}, 
where f(x) = x7 Ax assumes maximum and minimum, respectively. 


5. Let u = [1 1 è 1] be a 1 x n matrix and A = I + uu”. Find the 
vectors Tmax, Tmin ÎN 


{ ER": a7 x =1}, 
where f(x) = x7 Ax assumes maximum and minimum, respectively. 


6. Show that for every n x n matrix A, there exists an orthogonal matrix such 
that 
ATA = QTAATQ. 


7. Let A be an n x n real matrix. Let R : R” — R be defined by 


_ lei 
EIN 


Find the maximum and minimum of R. 


R(x) 
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8. Let A = U; DVI = ouv? + Oouzul +. + Oul be a singular value 
decomposition of the m x n real matrix A where the matrices Up = [u1 : uz : 


-i ug] and Vy = [v1 : va : +-+- : Up] both have orthonormal columns, and 
op. De 0 
0 o > 0 
D =: 
0 0 --- o, 


is an k x k diagonal matrix with 01 > 07 > :::- > ok >0. 
i) Explain that A and U; have the same column space. 


ii) Explain that for every b € R™, b = UUJ b is the orthogonal projection of 
b onto the column space of A. 


iii) Verify that í = V¿DLUTb is a least squares solution of Ax = b. 


9. Let A be an n x n matrix. Show that there exists a positive semidefinite 
matrix P and an orthogonal matrix Q such that A = PQ. We call the A = PQ 
a polar decomposition. 


10. Let A be an n x n positive definite matrix. Show that for every n x n 
matrix B, BTAB and B have the same rank. 


7.2 Principal component analysis 


Example 7.2.1. Let X = [x1 : X2 : -+ : Xn] be an m x n matrix with each 
column an observation vector of a measurement made on an object x. For 
example, a survey of n = 1000 families on the vector x = (a1, £2, ::: , £8) 
where x;, i = 1, 2,--- ,8 stand for annual income, annual expenses on cars, 
computers, food, medicine, insurance, education, entertainment, respectively, 
may be written as an 8x 1000 matrix X. Then row îi of X is a set of observations 
on the variable x; contained in x. 

One may analyze the data contained in X for different purposes. For exam- 
ple, we may investigate how annual income is correlated with annual expenses 
on insurance and medicine. Certainly, we assume the underlying correlation 
is linear, so that we may use the notion of linear dependency to study linear 
dependency among row 1, row 5 and row 6 of X. o 


For convenience, we translate the mean of the observed data to have zero 
mean, by setting 


=, = X; — M, 
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where M = Do X; is the mean of the columns of X. This is a analogy of 
translating a graph to have the center at the origin. Then the columns of 


A 


X=[X1:X2:-:::Xp]) 
have zero mean. We say that X is in mean-deviation form. Recall that the 
dot product x-y with x, y € R” is x-y = ||z|| - ||y|| cos 0, where @ is defined to 
be the angle between x and y. We observe that 


cos O = ERA 
ell (ly 
implies that if cos? = 0, x L y and zx, y are linearly independent; if cos? = 
1, x, y are co-linear and are linearly dependent. Therefore, the dot product 
x: y may be regarded as a measure of how much x and y are correlated. For 
instance, if the first row rı of X is the annual income, second row ra the 
annual expenses on cars, then rı r2 gives an indication of how annual income 
is correlated with annual expenses on cars. 
We call the matrix, 


S= XX, 


n— 
the sample covariance matrix, where A j measures how much the 7-th row 
of X are correlated to the j-th row of X. We call 


— NO, 


the variance of the j-th variable x; of the object vector x. The variance of xj 
measures the spread of the values of x; around the zero mean. 
The trace of S, 


n 
T) 
XX) ij = 


trace(S 


is called the total variance of the data X. 

In this section, we introduce the so-called principal component anal- 
ysis, which is a procedure that uses an orthogonal transformation x = Py 
to convert the observations of x whose variables are possibly correlated into 
a set y of linearly uncorrelated variables. We call the linearly uncorrelated 
variables of y the principal components. 

To wit, we assume that the m x n matrix X = [x1 : X2 : +++ : Xn] has 
zero mean and S = xX XT is the covariance matrix. We find an orthogonal 
m x m matrix P such that the change of variables, 
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has the property that the new variables y1, y2, -:: ,Yp Of y are uncorrelated. 
That is, the row of the values for y; is orthogonal to that for yj, if i + j. Let 
Y = [y1 :y2:-**:Ym] = PTX. We want the covariance matrix 
1 1 
YY? = [PT x, Plx: P? x,,][P? x, Plx: PT xn]? 
n— 1 n—1 
1 
== PARO 
n=1 
PSP 


to be diagonal. Since S = Xx XT is real symmetric, such an orthogonal 
matrix P exists. Moreover, since S is positive semidefinite (i.e., 27 Sx > 0 for 
every x + 0, see Exercise 6.5.5), the eigenvalues satisfy A > A2 > +--+ > Am > 
0. Let 


D= diag{A1, A2, Pee An) 


Then PTSP = D. That is, SP = PD. The unit vectors of columns of P = [p1 : 
Pa: *** : Pm] are eigenvectors of the covariance matrix S and are called the 
(directions of) principal components of the data of observations. We call 
the eigenvector corresponding to the largest eigenvalue the first principal 
component; and call the eigenvector corresponding to the second largest 
eigenvalue the second principal component, and so on. Moreover, y with 


spot i=1, 2, a 


becomes the new variable with uncorrelated coordinates and A; measures the 
variance of the new variable y;. 

Notice that by Theorem 6.2.6, trace(S) = trace(P? SP). The total variance 
of S is not changed after orthogonal diagonalization. Then the fraction 


Ai = Ai 
trace(S) Ay + Ag +++: Am 
indicates the portion of variance contributed by the ¿-th principal component 
Pi- 
Example 7.2.2. Let the following table list the data of the annual income 


and annual living expenses of five families (in thousand dollars): 


I 2 3 4 5 
70 125 125 135 250 
50 60 61 64 70 


income 
living expenses 


i) Find the covariance matrix for the data; 

ii) Make a principal component analysis of the data to find all the principal 
components; 

iii) Find the total variance T in the data, and the fraction contributed by 
the first principal component. 
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Solution: i) Let X;, i = 1, 2,--- ,5 denote the i-th column of the table of 
the data. The sample mean vector is 


Let A = [X — m: X2- m:---: X5 — m]. Then 


Ae —71 -—16 —16 6 109 
o 11. —1 0 3 gjë 


The covariance matrix S is then 


1 
=—— AAT 

8 5-1 
_1 [17470 1796 
41 1796 212|? 


ii) The eigenvalues of S are 
A = 4413.73, A2 = 6.76899. 
The corresponding vectors are 


un [0994741] _[ 0.102423 
1 |-0.102423|> 2? |—0.994741| ° 


where u; is the first principal component and uz the second principal compo- 
nent. 
iii) The total variance T is 


T = Au + A2 = 4420.5 = trace(S). 


The fraction contributed by the first principal component is at = an = 
99.8468%. o 


Exercise 7.2.3. 


1. The following table lists the data of the scores of five students: 


1 2 3 4 5 


Exam I 70 75 85 95 60 
Exam II |50 60 75 65 70 
Exam III | 76 59 61 80 80 


i) Find the covariance matrix for the data; 


ii) Make a principal component analysis of the data to find all the principal 
components; 


iii) Find the total variance T in the data, and the fraction contributed by the 
first principal component. 
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2. Let X = [x1 : X2 : ++- : Xp] be an m x n matrix with zero mean and P an 
m x m invertible matrix. Show that Y = [P?x, : PTx> : +++: PTx,,] has zero 
mean. 


Taylor & Francis 
Taylor & Francis Group 
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8.1 Linear transformation and matrix representation ............... 169 
8.2 Range and null spaces of linear transformation ................. 174 
8.3 Invariant subspaces ......... eee 179 
8.4 Decomposition of vector spaces ........ 0... eee 184 
8.5 Jordan normal form eee 187 
8.6 Computation of Jordan normal form ............. 0. cece eee ees 191 


A fundamental theme of many branches of mathematics is the study of func- 
tions, or transformations, between vector spaces. A function can be first clas- 
sified as linear or nonlinear. For example, f : R > R with f(x) = 2x is a linear 
function, while g : R > R with g(x) = 2? is a nonlinear function. A linear 
function can be further classified as homogeneous linear or nonhomogeneous 
linear. For example, f : R > R with f(a) = 2a is homogeneous linear while 
h : R > R with h(x) = 2x + 1 is nonhomogeneous linear. As an important 
application of matrix theory, we devote this chapter to a brief discussion of 
homogeneous linear functions between vector spaces. In what follows, if no 
confusion otherwise arises, we say that a function is linear when it is homo- 
geneous linear. 


8.1 Linear transformation and matrix representation 


Definition 8.1.1. If T : V > W is a function where V and W are 
vector spaces such that for every u, v € V and scalar k, 


T(u +v) =T(u)+ Tv), 


T(ku) = kT(u), 


then we call T : V — W a linear transformation. If, in addition, 
V = W we call T a linear operator on V. 


Example 8.1.2. Let A be an mx n real matrix. Then the map T : R” > R™ 
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defined by T(x) = Az is a linear transformation, since for every u, v € R” 
and scalar k € R, 


T(u +v) = A(u + v) = Au + Av=T(u) +T (v), 
T(ku) = A(ku) = kAu = kT (u). 
Oo 


Example 8.1.3. Let A = [c1 : C2 : +++: Cn] be an m x n real matrix with 
rank(A) = n. Then the projection P : R” > R” is a linear transformation 
from R” to R”. Indeed, for every b € R”, we have the orthogonal projection 
Pb = A(AT A)-1A75. Then for every u, v € R” and scalar k € R, 


P(u+v) = A(ATA) LAT (u + v) = Put Po, 
P(ku) = kP(u). 
P is a linear transformation from R” to R”. o 


Example 8.1.4. Let P, denote the vector space of all real polynomials with 
degree less than or equal to n. The map T : Po > P3 defined by 


T(p)(x) = zp(z) 
is a linear transformation. Indeed, for every f, g € Po and k € R, we have 
T(f + 9)(2) = (f (2) + g(2)) = zf (2) + 2g(a) = T(N(e) + T\(2). 
T(kf)(x) = (kf (x)) = k(x f (x)). 
Therefore T is a linear transformation. o 


Example 8.1.5. Let V be a real vector space with dim V = n and a basis 
{v1, va, +++ , Un}. The coordinate map |] : V > R” defined by 


is a linear transformation. Indeed, for every u, v € V and scalar k € R, we 
have 


[](u+v) = [u + v]v 


= [[v1 tva: 2 Un]luly + [vi i vases: vn]lv] y], 
= [lo : v2 : =>: Un] ({ulv + [v]v]v)],, 
= [ulv + lvJv]v 
= [](u) +[1(o), 
[](%u) = [ku] y 
= [kjv : 02:00: Un]lu]y] ,, 


= [lor Voi vn]lk[u] y], 
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kluly 
= k| J(u). 


That is, the coordinate map |]: V — R” is a linear transformation. O 


Remark 8.1.6. Recall that a function f : A —> B is said to be one to one, 
or injective , if for every zi, 12 € A with f(x1) = f(x2) we have 11 = 22. 
f: A— Bis said to be onto, or surjective , if for every y € B there exists 
x E A such that y = f(x). If f : A — Bis both injective and surjective, f is 
called a bijection. 

Using the fact that a basis of a vector space V with dim V = n is linearly 
independent, we can show that the coordinate map |] : V — R” is a one-to- 
one and onto map. O 


Theorem 8.1.7. Let T : V — W be a function from a vector space 
V to a vector space W with dim V = n and dim W = m. Let S = 
{v1, va, +++ , Un} be a basis of V and L = {w1, wa, ::: , Wm} be a basis 
of W. T is a linear transformation if and only if there exists an m x n 
matrix A such that for every x € V, 


[T(z)]1 = Alr]s. 


Moreover, if T is a linear transformation, then 


A = [[T(vi)]z : [Poole : ++: [P(0n)]2]. 
Proof. “=>” Let {v1, vz, ++: , Un} be a basis of V. For every x € V, there 
exists a coordinate vector [x], = (£1, £2, +- , £n) ER” such that 
x = [vy : v2: -+> unllz]s. 


Then by linearity of A we have 
T(x) =T (lui : va : +++: vnlfæls) 
=T (1101 + 1202 +--+ + Inn) 
=21T (01) + 122T (v2) +--+ + £nT (vn) 


=21 [Wy : wa : ++: Wo] [T (or) |, +x2 [01 : wa : ++: wml [T (va) e 
+e Tali i wots w m IT (v m] 

=[w1 : w2 : >>>: Wm] (21 [T(01)]1 + za] Tv) +++: +2n[T(Un)]1) 

=[w1 : w2 : >>>: Wm] [[P(v1)]z : (Pade: +++: EUn) fals- 


That is, 
[T(x)|z = Alz]s, 
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where 
A = [[T(v1)]z : Tor :---: [Tlon)lz). 
“<=” Tf for every x € V, 
[T(x)]1 = Alz]s 
Then we have 
T(x) =[w1 : wa : -++ : ml (a) 
=[w, : wa : ++- : Wm) Alz]s 


For every u, v € V and scalar k we have 


T(u +v) =[w1 : w2 : -+ - Wo] Alu + vls 
=[wy : wa : >>: Wn JA([u]s + [v] 5) 
=T(u) + T(v), 
T (ku) =[w1 : wa : +++: Wm]Alkuls 
=[wy : wa : -+ - : WnJAklu]s 
=kT (u). 
That is, T is a linear transformation. O 


Remark 8.1.8. Fix a basis S = [01, v2,- ,Un} of V and a basis L = 
{wi, v = w2, -++ , Wm} of W. We call the matrix A = [[T(v1)]z : [T(va)]z : 

- : [T(v,)]1] defined in Theorem 8.1.7 the representation matrix of the 
transformation T : V > W with respect to basis S and L, and denote it by 
[T] when the bases of V and W are clear. 

When T : V > W is a linear transformation with V = W, we call T a 
linear operator on V. In this case we usually choose the same basis S for 
both of the range and domain of T and use [7] s to denote the representation 
matrix of T. O 


Theorem 8.1.7 shows that a linear transformation between finite dimen- 
sional vector spaces can be completely described by a matrix multiplication 
of the coordinate vectors. Note also that the coordinate map |] : V > R” de- 
fined by | ](a) = [2] y is a bijection. Thus properties of linear transformations 
can be investigated by studying the representation matrix. 


Example 8.1.9. Let T : P> > P be defined by 
T(p)(x) = xp(2). 


Then by Example 8.1.4 T is a linear transformation. Let {1, x, x7} be a basis 
of P> and (1, x, 2?, x3} be a basis of P3. Then the matrix representation 
matrix A of T is 


A=[[T()]p, : T(x)|p, : T(x?)] Py] 
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o.oo 
e OOO 


Let f(x) =1+ 27. Then we have 


[T(f)]p, =Al/le 
0 0 
= tO %0 ; 
“joai Of J, 
01 


= oe Oo. Oo Oo AO 


Therefore we have 


Exercise 8.1.10. 


1. Determine whether each of the following functions from R” — R” is a 
linear transformation. 


i) f(x)=1,¢ €R; 

ii) f(x) =2,c €R; 

iii) f(a)=a+1,zeR; 

iv) f(x) = 22,2 €R; 

v) f(z) =21 +22, x = (21, 22) € R?; 
vi) f(x) = (2x1, 22), x = (21, £2) € R?; 
vii) f(x) = (x, 21), x = (z1, £2) € R?; 

viii) f(x) = (0, 0), x = (x1, 22) € R?. 


2. Let T : Po > Pa be defined by 


T(p)(x) = p(x) — p(0). 
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i) Show that T is a linear transformation; 


ii) Find the representation matrix of T under the standard basis of Pa and 
find its rank; 


iii) Determine whether or not T is injective and whether or not T is surjec- 


tive. 
3. Let V be a vector space with dim V = n. Let By = {v1, V2, +- ,Un} be a 
basis of V. For every set (a, £2, ++ ,£n} in V, there exists a unique linear 
transformation T : V + V such that T(v;) = aj, i= 1, 2,--- ,n. 


4. Let Man (R) be the vector space of all n x n real matrices. Let B € Mnn(R) 
and define T : Mnn(R) > Man (R) by 


T(A) = AB — BA. 


i) Show that T is a linear operator. 


ii) Find the matrix representation [T] of T under the standard basis of 
Man (R). 


iii) Show that for every A, C € Mnn (R), T(AC) = AT(C) + CT(A). 


5. Let V be a vector space and T : V > V a linear operator on V. If W CV 
is a subspace of V, then 


dim(TV) + dim W — dim(TW) < n. 


6. Let V be a vector space and T : V — V a linear operator on V. We say that 
T is an invertible linear transformation if there exists an operator S on V 
such that TS = ST = I, where J is the identity map. Let S = (ea, €2, -** , En} 
be a basis of V. 


i) Show that T is invertible if and only if (Te, Te, --- , Ten} is linearly 
independent; 


ii) Show that T is invertible if and only if T is one to one; 


iii) Show that T is invertible if and only if T is onto. 


8.2 Range and null spaces of linear transformation 


Definition 8.2.1. Let T : V — W be a linear transformation where V and 
W are vector spaces. We call 


ker(T) = {x € V : Tx = 0} 
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the kernel of T, and call 

Range(T) = {Tx: 2 € V} 
the range of T. 


By linearity of T, we have that for every 11, 2 € ker(T) and for every 
scalar kı and ke, 


T (kya T ko2) = T(k,x1) + T(k2x3) = kıT (zı) + kaT (x2) = 0. 


That is, kya, + kox E ker(T). Therefore, ker(T) is a subspace of V. Simi- 
larly, for every yı, y2 € Range(T) and for every scalar kı and kg, there exists 
zi, 2 E V such that T(z1) = y1, T(xa2) = ya and 


kıyı + k2y2 = kıT (a1) + kaT(a2) = T (kya + kaa). 


That is, kya, + kax2 € Range(T). Therefore, Range(T) is a subspace of W. 
We arrive at 


Theorem 8.2.2. Let T : V — W be a linear transformation where V 


and W are vector spaces. Then ker(T) is a subspace of V and Range(T) 
is a subspace of W. 


Let T : V > W be a linear transformation where V and W are vector 
spaces with dim V = n and dim W = m. Let (va, va, ++: , Un} be a basis of V 
and {w1, wa, ::: , Wm} be a basis of W. Then by Theorem 8.1.7, there exists 
an m x n matrix A such that [T(x)]w = Alx],. Then we have 


ker(T) =(z € V:Tzx=0) 
=[x2 E V : [w1 : w2 : ++: Wm][Tzx]w = 0} 
={x E V : [Tz]w = 0) 
={x € V : Alx], = 0). 


That is, 


The kernel of T is the set of all vectors in V whose coordinate vectors 


are in N(A). 


Then by Example 8.1.5 and Remark 8.1.6 that the coordinate map |] : 
V —>R” is a bijection we have 


dim ker(T) = dim N(A). 
Similarly, we have 


Range(T) =[Tx:x € V} 
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=([w1 : w2: +++: wm][Talw : 1 € V} 
=([w1 : wa : ++: WmJAlz], : 7 E€ V}. 


We have dim Range(T) = dim C(A). That is, 


The range of T is the set of all vectors in W whose coordinate vectors 


are in C(A). 


Recall that for an m x n real matrix A, we have dim N(A)+dim C(A) = 
For linear transformations we have the following similar results. 


Theorem 8.2.3. Let T : V — W be a linear transformation where V 
and W are vector spaces with dim V = n and dim W = m. Then we 
have 


dim ker(T) + dim Range(T) = 


Example 8.2.4. (Example 8.1.9 revisited) Let T : P> + P; be defined by 


Then the matrix representation matrix A of T is 


A =[T()]ps : Tolea, : Tele] 


0 
0 
1 
0 


= O OS 


Then we have N(A) = {0} € R? and 
ker(T) = {0} € Ps. 


Moreover, C(A) = {(0, x) €R* : x € R3). Therefore we have 


0 

Range(T) = 4 [1 z 2? a] a : (£1, £2, 23) E R? > =spaní[x, 2”, x°}. 
2 
T3 

D 


Theorem 8.2.5. Let V and W be vector spaces, T : V — W a linear 


transformation. Then T is one to one if and only if ker(T) = {0}. 
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Proof. “=>” Suppose ker(T) + {0}. Then there exists za + 0, 21 € V such 
that T(x1) = 0 = T(0). T is not one to one. This is a contradiction. 

“E” Let T(x1) = T(x2), 21, v2 € V. Then we have T (zı — x2) = 0 and 
zı = 22 since ker(T) = {0}. O 


By Theorems 8.2.3 and 8.2.5, we have that 


Theorem 8.2.6. Let T : V — W be a linear transformation where V 


and W are vector spaces with dim V = dim W = n. Then T is one to 
one if and only if T is onto. 


Let T : V > V be a linear operator on V. It turns out the representation 
matrices of T with respect to different bases are similar. To be more specific 
we have 


Theorem 8.2.7. Let V be a vector space with dim V = n. Let T : 
V > V be a linear transformation. Let S = {v1, va, --* , Un} and 
L = {wi, wa, ++: , Wn} be bases of V. Then we have 


[T]s = PIT] P, 


where P is the transition matrix from basis L to S. 


Proof. For every x € V, we have 


Ta =[v1 : v2 : Un] [Tz]s 
=[u1 v2 : Un] [7)s[z]s 
=[w1 : w2 wal? TPL, 
We also have 
Tx = [wy :w2:>>>:0p][Tz], = [wi : w2 : >>>: w][7] [2] 1. 


Then we have 
P=[T]sPle]z =[Tlele]r. 


Since x is arbitrary, we have [T], = P=*[T]5P. o 


Example 8.2.8. Let T : R? > R® be a linear operator which has the repre- 
sentation matrix 


1 0 il 
Tz=| 11 Of, 
-1 2 1 


under the basis L = (m, m, 73} where 


—1 0 —1 
m=| 0|, n= |-1|, n= j| 1 
1 1 0 
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Find the representation matrix [7] y, where S = ([e,, €2, ez) is the standard 
basis of R3. 


Solution: The transition matrix from L to S is 


Q = [m : m : n] 
—1 0 —1 
=| 0 —l 1 
1 1 0 
That is, [7 : n2 : n3] = [ea : €2 : es]Q. Then the transition matrix from S to L 
—1 0 1 
isQ-'=4] 0 —1 1). Therefore we have 
-1 1 0 
[T]s =Q(T]LQ~* 
-1 38) Po St Pek 01 
Se tests | fe ol] o -ıı 
1 A “Gi eto Sal] ats ı0 
0 2 —2 
=; i 0 =1 
—1 -2 3 
That is, 
Tea : ca : €3] = [ea : €2 : es [T]s. 


Exercise 8.2.9. 


1. Let T : R? — R? be a linear operator which has the representation matrix 


—4 0 -4 
[Tlz aa 0 -1 -1 > 
3 6 9 


under the basis L = (m, m, 73} where 


—1 0 3 
m=] 0|, n= |1|, n= |-1 
1 1 0 


Find the representation matrix [T]5, where S = {¢1, €2, ez) is the standard 
basis of R3. 
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2. Let S = {v1, v2, v3, val be a basis of a vector space V. Let T : V > V be 
a linear transformation with 


1 3 10 
0 2 -6 0 
7ls=|o o 40 
0 -9 -3 1 


i) Find [T]y if N is a basis of V with the transition matrix from N to S 
given by 


1 31 
0 2 3 
> [00.8 
0 0 0 
ii) Find ker(T) and Range(T). 
iii) Find a basis of ker(T) and extend it into a basis of V. 
iv) Find a basis of Range(T) and extend it into a basis of V. 


3. Let T : V > V be a linear operator. Show that TV C ker(T) if and only 
of T? =0. 


4. Let V be a finite dimensional vector space. Let Ti, Tə : V — V be linear 
operators. Show that TıV C T2V if and only if there exists linear operator T 
on V such that Ti = ToT. 


5. Let V be a finite dimensional vector space. Let 11, Tə : V — V be linear 
operators. Show that ker Tı C ker T> if and only if there exists linear operator 
T on V such that Ty = Tı T. 


8.3 Invariant subspaces 


Definition 8.3.1. Let V be a vector space and W C V a subspace. Let 
T : V > V be a linear operator. If TW C W, that is, for every w € W, 
Tw € W, then we call W a T-invariant subspace of V. 


Example 8.3.2. Let T : V > V be a linear operator. Then the linear space 
V and {0} are T-invariant. o 


Lemma 8.3.3. Let T : V > V be a linear operator. Then ker(T) and 


Range(T) are T-invariant subspaces of V. 
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Proof. For every w € ker(T) we have Tw = 0 € ker(T) and hence T(ker(T)) € 
ker(T). For every y € Range(T), there exists v € V such that y = Tv € V 
and Ty = T(Tv) € Range(T). O 


We are interested in one-dimensional T-invariant subspaces, say, W = 
span(v) with v # 0. If W is T-invariant, then for every w = av € W with 
a + 0 a scalar, Tw = kv for some scalar k. That is, 


Definition 8.3.4. Let V be a vector space over the scalar field A and 
T:V—V a linear operator. If x + 0 is such that 


ip = Nap, 


for some scalar A € K, we call x an eigenvector of T associated with 
the eigenvalue A. 


If V is finite dimensional, there exists a matrix representation [T] under a 
given basis. Then T(v) = Ëv can be rewritten 


where |v] is the coordinate vector of v. Therefore, if W = span(v) is a one- 
dimensional T-invariant subspace of V, then the spanning vector is an eigen- 
vector of the representation matrix. 

Conversely, if v # 0 is an eigenvector of T associated with the eigenvalue 
A, then for every w € W = span(v), we have w = av for some a € K and 
that Tw = Aw = Aav € W. That is, W is T-invariant. 

We have arrived at 


Lemma 8.3.5. Let T : V — V be a linear operator. W is a one- 


dimensional T-invariant subspace of V if and only if W is an eigenspace 
of T. 


Remark 8.3.6. For linear operators on general vector spaces, we usually 
specify the scalar field on which we have eigenvalues and eigenvectors. For 
example, the map T : R? — R? defined by 


0 —1||z 
has no eigenvalues in the underlying scalar field R. But T' has eigenvalues 
A1,2 = +i in C if T is defined on C?. o 
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We know that every linear operator T on a finite dimensional space V has 
a square matrix representation [T], which is dependent on the choice of the 
basis of the vector space. A question is how to choose a proper basis of V such 
that the matrix representation [T] is diagonal? 
Suppose S = [vı : vg : +++ : Un] is a basis of V. We seek for a basis 
L = [un : w2 : +++: wn] of V such that [T]z is diagonal. By Theorem 8.2.7, we 
have 
[T] = P-ITIsP, 


where P is the transition matrix from L to S. To have a diagonal [T]z we need 
the columns of the invertible matrix P to be eigenvectors of [T]s. Namely, we 
need only to choose L = [w; : wa : +++: Wp] such that 


[wy : w2 : +++: Wn] = [v1 : 02 : +++: Vn] P. 
In summary, we have 


Lemma 8.3.7. Let T : V > V be a linear operator on an n dimen- 
sional vector space V with a basis S = [vı : va : ++- : Un]. If [T]s is 
diagonalizable, then N = [v1 : v2 : +++ : vn]P is a basis of V such that 
[T] is diagonal, where P is such that 


=p 


Example 8.3.8. Let S = {v1, v2, v3, va) be a basis of a vector space V. Let 
T : V >V be a linear transformation with 


1 3 1 


i) Find [7] y if N is a basis of V with the transition matrix from N to S 
given by 


ii) Find a basis L of V such that [T]z is diagonal. 
Solution: i) Since P is the transition matrix from N to S, we have 


Fly =P7'|T]sP 


-1 
oa a i 
10 03 0 0 0 4 O} 10 0 3 0 
su ee RI E 
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ay? d alde “3 
|e Ge 3 
0 0 4 0 
0 -2 -12 —4 


ii) To find a basis L such that [7] is diagonal, we need to find the eigenvalues 
of [T]s and a diagonalizing matrix Q whose columns are eigenvectors of [T]s. 
Solving det([T]5 — AL) = 0, we have 


A-09A+DA+DA-1)=0=> à = 4, X =2,A3 =-1, M` =1. 


Corresponding to the eigenvectors A;, i = 1, 2, 3, 4, the eigenvectors q; are 
chosen: 


—2 0 3 1 

3 0 —2 0 

q = -15 » q2 = 0 » 43 = 0 > q4 = 0 
7 1 2 0 


That is, if we choose Q = [qı : q2 : q3 : q4] as the transition matrix from basis 
L to S, where 


-2 0 3 1 
3 0 —2 0 
=[01 V2 : U3 : Un] 15 0 o ol» 
7 1 2 0 
4 0 0 0 
then [7] = a pd de E , which is diagonal. O 
0 0 1 


Theorem 8.3.9. Let T : V > V be a linear operator on an n di- 
mensional vector space V over the complex numbers C with a basis 
S = [vy : vg i++: : un]. If [Tis is diagonalizable, then V assumes a 
direct sum decomposition of the eigenspaces: 


V=Wy, 9W,, © B Wans 


where W),, i = 1, 2, --- ,n are eigenspaces of T corresponding to the 
eigenvalues A;,i = 1, 2,- , n. 


Proof. Note that |T]s is diagonalizable and has n linearly independent eigen- 
vectors. Then correspondingly T has n linearly independent eigenvectors 
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which span the whole space V. That is, for every x € V, there exists x; € Wa, 
such that 
£ = T1 + T2 +e +H Tn. 
Since the n eigenvectors are linearly independent, we have 
Wy, NW, =0, ifi +j. 


Therefore, we have 
V = Wya SW, 0:::90W,,. 


Exercise 8.3.10. 


1. Let S = {v1, v2, v3} be a basis of a vector space V. Let T : V > V bea 
linear transformation with 


1 3 1 
[T]s = |0 27 —6 
0 0 4 
i) Find [T]w if N is a basis of V with the transition matrix from N to S 


given by 


ii) Find a basis L of V such that [7] z is diagonal. 


2. Let P3 denote the vector space of all polynomials with real coefficients. Let 
T : P3 + P3 be defined by 


T(f)(x) = x° f" (x). 
i) Find [T] under the standard basis (1, z, 2?, x°} of Ps; 
ii) Find all eigenvalues and eigenvectors of T. 


3. Let S, T : V > V be linear operator on a vector space V. Suppose that 
SoT =T o 8. Show that the range of S is a T-invariant subspace of V. 


4. Let T : V > V be a linear operator on an n-dimensional vector space 
V over the real numbers R with a basis S = [vy : va : +++ : Vn]. If [T]s is 
symmetric, then V assumes a direct sum decomposition of the eigenspaces: 


V=W,, OW, E Wa, 


where Wy,, i = 1,2,:::,n are eigenspaces of T corresponding to the eigen- 
values A;, i = 1, 2,---, n. 

5. Let T : V — V be a linear operator on a vector space V. Let f, g be 
polynomials. Show that f(T)g(T) = g(T) f(T). 


6. Let T : V > V be a linear operators on a vector space V. Let f bea 
polynomial and W = ker f(T). Then W is a T-invariant subspace of V. 
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8.4 Decomposition of vector spaces 


In the last section we have related the problem of diagonalization of the 
representation matrix of a linear operator on finite dimensional spaces to that 
of decomposition of a vector space into one-dimensional subspaces. 

A natural question is what if the representation matrix is not diagonaliz- 
able? We show in the next two sections that in this case, the representation 
matrix can be reduced into block diagonal form and further to Jordan 
normal form, which is a block diagonal form of a matrix. Moreover, the vec- 
tor space is decomposed into a direct sum of generalized eigenspaces associated 
with each of the Jordan blocks. 


Theorem 8.4.1. Let p,p; and pə be polynomials which satisfy that 
i) p= pp; 


ii) pı and pa are polynomials with degree larger than or equal to 
1: 


? 


iii) the greatest common divisor of pı and pa is 1. 


Let T : V > V be a linear operator on a vector space V. If p(T) = 0, 
then 


V= Wi 9 Wa, 
where W, = ker(pi(T)) and Wa = ker(p2(T)). 


Proof. Since the greatest common divisor of pı and pa is 1, by the Euclidean 
algorithm, there exist polynomials qı and q2 such that 


1 = pi(t)qi(t) + p2(t)q2(t), 


which lead to 


I = p(T) (T) + po(T)qo(T). 


Then for every v € V, we have the following decomposition of v, 
v = pı(T)q (Tw +» (Da (Tw, (8.1) 


where the first part satisfies 
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and the second part satisfies 


pı(T)p2(T)q2(T)v =q2(T pi (T)p2(T v 
=q2(T)p(T)v 
=0. 

That is, pi(T)qi(T)v € Wa and pı(T)qı(T)v € Wi. To prove the uniqueness 


of the decomposition, let v = ui + we = wi + w with wi, w} € W and 
wa, wh € W2. Then we have 


qi(t)pi(T)v = n(t)pi(T)(wi + we), q (t)pi(T)v = qı (t)pı (T) (w + w3), 
which lead to 


qi(t)pi(T)v = q (t)pı(T)w2, qı (t)pı(TWw = qı (t)pi(T) wy. 
Applying (8.1) to wa, wh we have 
wa = pilt)a(T)we, we = pi(t)a(T)ws. 


Then we have 
/ 
w2 = q(t)pi(T)v = w3, 
and wy = v — wz = v — wh = wi. That is, un = w|, w2 = wy. The decompo- 
sition is unique. o 


An example of the polynomial p such that p(T) = 0 can be seen from 
the Cayley-Hamilton theorem, where we know that if p is the characteristic 
polynomial of the representation matrix A of the linear operator T, then 
p(A) = 0. In the following we apply Theorem 8.4.1 to obtain a decomposition 
of a finite dimensional vector space according to the eigenspaces of a linear 
operator on the space. 


Theorem 8.4.2. Let T : V > V be a linear operator on a finite 
dimensional vector space V over the scalar field of complex numbers. 
Assume that p is a polynomial with p(T) = 0 and that 


WO = Sa ng) 


is a factorization of p where A;,i = 1, 2, --- ,r are distinct roots of p. 
Let W; = ker(T — AI)", i = 1, 2, --- ,r. Then we have 


V=W,0W20---oWw,. 


Proof. Let Pi = (t a Ama (t = AQ)" cae (t = A, P-i= (t — Ai + (t = 
Ag)? -e (t — Ap)" with i = 1,2,---,r. By Theorem 8.4.1, we have the 
following decomposition of V: 


V = W, ẹ W, 
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where W_ = ker(p_;(T)) is a subspace of V. Notice that p_1(T)w = 0 for 
every w € W_+. Regarding W_1 as the whole space, by Theorem 8.4.2, we 
have the following decomposition of W_;: 


Wi = W2 @ W2. 


It is known that Wa is a subspace of V. W_—ə2 is a subspace of W—1; hence it 
is also a subspace of V. Then we have 


V = W 8&8 W: 8W2ə. 


By mathematical induction, we can then obtain that 


V=W, We D--- Wr. 
O 


Remark 8.4.3. By Theorem 8.4.2, we can group the bases of each of the 


subspaces W;, i = 1, 2,--- to obtain a basis of V. Then the representation 
matrix of T on V is a block diagonal form: 
| A 0.0 
0 As +++ 0 
E A E 
O ke aoe Ae 


where A; is the representation matrix of T restricted to the subspaces W; = 
ker(T — A¡)"* which is called a generalized eigenspace of T. O 


Example 8.4.4. Let T : V > V be a linear operator on an n dimensional 
vector space. Suppose that T? = I. Show that 


i) T has eigenvalues A = 1, Ap = —1. 


ii) V = Wy, SW),, where Wy,, i = 1, 2 are eigenspaces of T corresponding 
to the eigenvalues A;. 
Solution: i) Let be a eigenvalue with v a corresponding eigenvector. Then 


we have 
Tv = Av T?y = XTv = Mov. 


Since T? = T, we have T?v = Tv = Av. Therefore we obtain 


My = Av => à = 1. 
ii) Let p(t) = (t — 1)(¢+ 1). We have p(T) = 0. By Theorem 8.4.2, we have 
V = Wy, SWa, 


where W),, 7 = 1, 2 are eigenspaces of T corresponding to the eigenvalues 
Ai. O 
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Exercise 8.4.5. 


1. Let T : V > V be a linear operator over an n dimensional vector space. 
Suppose that T? = T. Show that 


i) T has eigenvalues A = 1, Ap = 0. 


ii) V = Wy, BW, where Wy,, i = 1, 2 are eigenspaces of T corresponding 
to the eigenvalues A;. 


2. Let T : V > V be a linear operator over an n dimensional vector space. 
Let A;,7 = 1, 2,--- ,m be the set of all distinct eigenvalues of T. Suppose that 
the representation matrix [7] under a basis of V is diagonalizable. Show that 
there exist linear operators T;, i = 1, 2, --- ,m such that 


3. Let p,p; and pa be polynomials which satisfy that 


i) p = pipo; 
ii) pı and pa are polynomials with degree larger than or equal to 1; 


iii) the greatest common divisor of pı and pa is 1. 


Let T : V > V be a linear operator on a vector space V. If p(T) = 0, then 
ker(p1(T)) = Range(p2(T)). 


8.5 Jordan normal form 


We improve the results of the previous section to show that the blocks 
of the block diagonal decomposition of the representation matrix |T] can be 
reduced to a type of triangular matrices called Jordan blocks, if the bases 
of the decomposed subspaces are properly chosen. 

Let T : V > V be a linear operator on the n dimensional vector space V 
and v € V be a nonzero vector. If we repeatedly use T' to act on v to produce 
the sequence 

v, Tv T?w,---, 


then there exists a positive integer k € N such that T*v = 0, because of the 
finite dimensionality of V. 
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With the decomposition in Theorem 8.4.2, we seek basis of W = ker(T — 
AI)* where A is an eigenvalue of T. Let v € W be such that (T — AD Sly £ 0. 
Then (T — AI)iv + 0 for every 0 < i < k — 1. Consider the vector equation 


cov + a (T — Au + c(T — AD + + ekaa lT AD w = 0. 


Applying (T — aI)*~1! repeatedly on both sides of the above vector equation 
and using the fact that v is k-periodic, we successively obtain cy = 0, cy = 
0,---, Ck—1 = 0. Therefore, the set of vectors 


ET = ADtv, (T = AD, +++ (T— AL)v, v} 
is linearly independent. We have shown 


Lemma 8.5.1. Let v € V be such that (T — AD)*v = 0 but (T — 
dI)*-1v # 0. Then the set of vectors 


ET — ATD, (D= AD) 20, (D= AD, v} 


is linearly independent. 


Lemma 8.5.1 implies that 
ET -AD tw, (T = AD, +++ (TA, v} 


is an ordered basis for W = ker(T — AI, if v € W is such that (TAI) lv + 
0. Note that 


T(T — Av = (T — AMH + A(T — Atv, for every 0< i< k-—2, 
T(T — AD) = (T — ATD + MT — AD) lu = A(T — ADP 2. 


Then the representation matrix of T on W with respect to this basis is 


A 1 0 > 0 
0 A 1 

tela * “ol 
zo O A 1 
O --- 0 0 A 


which is called the Jordan block associated with the eigenvalue A of T. The 
basis 
UD — ADP, (T = AD, + (T—AD)o, v} 


is called a Jordan basis and the finite sequence of vectors is called a Jordan 
chain. The vector (T — \J)*~1v is an eigenvector of T and is called the initial 
vector of the Jordan chain. v is called the end of the Jordan chain. 
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Suppose that V has been decomposed into 


V=W, } W2 p---€ Wr, 


where W; = ker(T — X;)mi, i = 1, 2,--- ,r. If we choose a Jordan basis for 
each of the W;, then the union of the Jordan basis is a basis for V and the 
representation matrix of T on V is then 


= [0 2 | 
: : - 0 
0 sis 0 E 


where J;, i = 1, 2,---,r are Jordan blocks. We call this form of [T] the 
Jordan normal form for T. 


Theorem 8.5.2. Let T : V > V be a linear operator on a finite 
dimensional vector space V over the scalar field of complex numbers. 


Then there exists a Jordan basis of V with which the representation 
matrix [T] is in the Jordan normal form. 


Proof. Let T be restricted to a generalized eigenspace W = ker(T — A; I)”. 
Then the representation matrix of T has only one eigenvalue, say A. Let m 
be the least integer such that (T — AL)” = 0, namely, for every x € W, 
(T — ADT)™w = 0. 

We show that there exists a Jordan basis of W such that the representation 
matrix of |T] is in the Jordan normal form. Then by Theorem 8.4.2, the 
union of the Jordan bases of every generalized eigenspaces is such that the 
representation matrix of |T] restricted to W is in Jordan normal form. 

We proceed with induction on the dimension of W = ker(T — AL)”. If 
dim W = 1, the statement of the theorem is trivially true. Assume that for 
dim W < n — 1 the statement of the theorem holds. We prove it holds for 
dimW =n. 

Let B = T — AI and r = dimker B. Note that there exists w € W with 
Blu #0 and B™~!w € ker B. Then the formula dim BW + dim ker B = n 
implies that BW is a proper subspace of W and dim BW < n-—1. By induction 
assumption, BW can be decomposed into subspaces W;, i = 1, 2, +--+ ,k: 


BW =W, 98W29- Wk, 
and for each W;, i = 1, 2,--- , k, there exists a Jordan chain, 
Bhi, ae , Bui, Ui, 


and the union of which is a Jordan basis of BW with n — r vectors. 
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Next we extend the union of the Jordan basis of BW into a basis of W. 
Note for every u; € BW, i = 1, 2, --- , k, there exists a nonzero v; € W, such 
that Bu; = u;. Then the Jordan basis of BW is extended into 


Bv, Bo, ste, Us, i = 1, 2, Era k. 
Notice that Bliw; = Bli-lu, € ker B, i = 1, 2,--- ,k. We can extend the set 
{B'iv;}*_, into a basis of ker B by adding more vectors, say, W1, Wa, *** , Wr—k, 
where each of the w/s is a Jordan chain of length 1. 
It remains to show that 
k 
U {u;, Bu;, ss , Big) U {w; d (8.2) 
i=1 


is linearly independent. Consider the vector equation 


NY oB n+ Sam =o, (8.3) 


i=1 j=1 


where the c;;s and d;s are constants to be determined. Applying B on both 
sides of the vector equation, we have 


kok k L 
Y ia =0= Y Y ej Blue 


i=1 j=1 i=1 j=1 
Note that Biu; = 0 for every i = 1, 2,--- ,k. Then we have 
k l¿-1 
5 cij Bu; = 0, 
i=1 j=1 
which by induction assumption leads to c;; = 0 for every î = 1, 2,--- ,k, 


j=1,2,--- ,1;—1. Then by (8.3) we have 


k r=k 
> li > E 
Ci, B Vi + djwj =0, 
i=1 j=1 


which leads to ci, = 0, for i = 1, 2,---,k and d; = 0, j = 1,2,- ,r— k 
since this is a linear combination of the basis of ker B. Therefore, the set of n 
vectors at (8.2) is linearly independent and is a Jordan basis of W. 

o 


Remark 8.5.3. If we identify a linear operator with its matrix representation, 
Theorem 8.5.2 indicates that every n x n matrix over the complex numbers 
has a Jordan normal form. o 
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-1 -2 6 
Example 8.5.4. Let A = |—1 O 3| be the representation matrix of a 
—1 -1 4 


linear operator T : V — V. Find the Jordan normal form of A. 


Solution: Let p(t) = det(A — AI) be the characteristic polynomial. Solving 
p(A) = det(A — AL) = 0, we have —(\ — 1)? = 0 and the eigenvalues A123 = 1. 
Solving (A — AI)a = 0 we obtain two linearly independent eigenvectors: 


-1 3 
u= 1 , Uy = 1 
0 0 


Since the algebraic multiplicity of the eigenvalue A = 1 is 3, but geometrical 
multiplicity is 2, by Theorem 6.2.5 A is not diagonalizable. By Theorem 8.5.2, 
there exists a Jordan basis such that the representation matrix of T' is in 
Jordan form, or, equivalently, A is similar to a matrix in Jordan form. 

Next we compute the least number m such that (A — \J)™ = 0. We have 


-2 —2 6 
A-I=|-1 -1 3|,(4-D?=0. 
-1 -1 3 


This means that a basis of generalized eigenvectors consists of a Jordan chain 
of length 2 and one of length 1. The Jordan form is 


1 0 0 
J= |0 1 1 
0 0 1 


Exercise 8.5.5. 


1. Let A be an n x n complex matrix. Show that A can be decomposed into 
the sum of a diagonal matrix and a nilpotent matrix. That is A = D + N, 
where D is diagonal and N” = 0 for some m E N. 

2. Let A= f k where € € R is a parameter. Show that if e = 0, A is not 
diagonalizable; otherwise, A is diagonalizable. 


8.6 Computation of Jordan normal form 


In the last section we have proved existence of Jordan normal form. We 
show in this section that the Jordan normal form is unique up to the order of 
Jordan blocks. We also develop a method to compute Jordan normal form. 
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Theorem 8.6.1. Let T : V > V be a linear operator on an n dimen- 
sional vector space V over the scalar field of complex numbers. Let 
A € C be an eigenvalue of T. Then for every positive integer m € N, 
the number of m x m Jordan blocks 


Nm =rank(T — AI)! — 2rank(T — AD)” + rank(T — AD), 


Proof. Suppose that under a Jordan basis, T is in a Jordan normal form with 


Jordan blocks Jr, m,, i = 1, 2, :::- ,k and mı +M +: +t mMk =n, 
Tastes 0 or 0 
[7] = j Jaz ma Ea > 
; e E 0 
0 Tia OS) Iret 


where the Jordan blocks associated with the same eigenvalue are grouped 
together. Then the map T — XJ, where A is one of the eigenvalues, has a 
Jordan normal form with two types of Jordan blocks. One type is those with 
zero main diagonals, and the other with nonzero main diagonals. Namely, we 
have 


Jomi O 38 e ok 0 
[7 — AI] Ez Pe Jo, mi ES Ñ | 
: TA pg mipi k 0 
: a i: 0 
0 e ts on UES E 


where we placed the blocks with zero main diagonals in the left uppermost 
positions just for convenience of visualization. 

Notice that rank(J9, mi) = mi — j for every 0 < j < m; and Tom: = 0. 
Moreover, for every Jordan block Jy,m, in [T — AI] with u + 0, rank(J2 m,) = 
mi, for every positive j € N. 
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For every m € N, we note that rank(Jj",) = 0, rank(Jgm+1) = 1, 
rank(Jgm+2) = 2 and so on. Then we have 


rank(T — AD)” = Na + 2N m42 ++ (n = Na+ >> rank(J% m,)- 
Ag +0 


By the same token, we have 


rank(T -ADH = Niy942Nm43 +0 (nm Nat So rank(J% m) 
AO 


Then we have 
rank(T — AD)” — rank(T — AD)™** = Ninti + Nm Na (8.4) 
Since m is arbitrary, we have 
rank(T — AI)! — rank(T — AD” = Non + Na t+: + Na (8.5) 


Note that if m = 1, (8.5) is valid and we have rank(T — AI)! = rank(T — 
AL)? = n. This is because each Jordan block with all zeros in the main diagonal 
is 1 rank in deficiency from the full rank and Nm + Nm+1 +--+: + Nn counts 
the total rank deficiencies for every Jordan block. Therefore, (8.5) combined 
with (8.4) gives 


Nm = rank(T — AI)! — 2rank(T — AI)” + rank(T — AD), 
O 


Remark 8.6.2. By Theorem 8.6.1 we know that the number of a specific 
size of Jordan blocks is independent of the choice of the Jordan basis since 
representation matrices under different bases are similar. Hence the Jordan 


normal form is unique up to the order of the Jordan blocks. O 
-1 -2 6 

Example 8.6.3. Let A = |—1 0 3| be the representation matrix of a 
—1 -1 4 


linear operator T : V > V. Find the number of 1 x 1, 2 x 2 and 3 x 2 Jordan 
blocks of A. 


Solution: Let p(t) = det(A — AI) be the characteristic polynomial. Solving 
p(A) = det(A — AL) = 0, we have —(\ — 1)? = 0 and the eigenvalues A123 = 1. 
By Theorem 8.6.1, we know that for the eigenvalue +123 = 1, we have 
N, =rank(A — I)° — 2rank(A — T)! + rank(A — 1)’, 

N =rank(A — I)! — 2rank(A — I)? + rank(A — T)’, 

Na =rank(A — I)? — 2rank(A — I)? + rank(A — T)’. 


194 Concise Introduction to Linear Algebra 


We have 
-2 -2 6 
A-I=|-1 -1 3], (A-J)?=0. 
-1 -1 3 


We have rank(A — I)? = 3, rank(A — J)! = 1, rank(A — 1)” = 0 for m > 2. 
Therefore, we have 

Nı =3—-2-1+0=1, 

Nə =1-2-0+0=1, 


Na =0. 
The Jordan form is then 
1 0 
J=|0 1 1 
0 0 1 


Theorem 8.6.4. Let £1, 22, ::: ,£k be linearly independent eigen- 
vectors of T corresponding to the same eigenvalue A. Suppose that for 
i = 1, 2, --- ,k, there exists u; such that 


I = (T — ND = 0, ig = (T = AD ux = 


Let J; be the Jordan chain of {(T — AD)™-1u, (T — 
AD): 2uj, ==> , wi}. Then J = UL, J; is linearly independent. 


Proof. We first show that Ji N J; = 0 if i + j. Otherwise, there exists k, l with 
1 < k < m; and 1 < l< m; such that 


(T — AD) Fu, = (T — AD™ ty. (8.6) 


Then applying (T—AD)*=1 and (T—ALD)'=* on both sides of (8.6), respectively, 
we have 


z; SPADE SR li g = (T RA 


If k = l, we have x; = x; which is a contradiction since x; and x; are linearly 
independent eigenvectors. If k > l, then m; —1+k—1 > m; and we have 
x; = 0, which is also a contradiction. If k < l, then m; —k +1—1> m; and 
we have x; = 0, which is a contradiction, too. 

Next, we show that UE, J; is linearly independent. Let W = span ME Ji 
and restrict B = T — Al to W. Then BW is an invariant subspace of W. We 
proceed with mathematical induction on the number of vectors in Us Jj. 

If the number of vectors in w Ji is less than or equal to 1, then the 
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statement of the theorem is trivially true. Assume that the statement of the 
theorem is true with the number of vectors in U;_, Ji less than n — 1. We 
consider the statement with the number of vectors in US J, equal to n. 
Then we have dim W < n. 

Next we find a basis of BW. Let J’ = Us Ji \ {uj}. Then on the one 
hand, we have span(J’) C BW. On the other hand, for every v € BW, there 
exists w € W such that v = Bw € span(J”). Therefore, span(J’) = BW and 
by induction, J’ is a basis for BW since the initial vector of the Jordan chains 
is not changed from J to J’. It follows that dim BW = n — k. 

Note that the k initial vectors of the Jordan chains in J’ are in ker B and 
we have dimker B > k. By Theorem 8.2.3, we have 


n > dim W = dim BW +dimkerB>n—-—k+k=n. 
It follows that dim W = dimspan(J) = n and J is linearly independent. oO 


The next question is how to find a Jordan basis to reduce a representation 
matrix into Jordan form. We may proceed with the following steps. 


(1) Find the number of Jordan blocks of every possible size of m and deter- 
mine the Jordan normal form. 


(2) For every eigenvalue A and every Jordan block size m, compute u; such 
that 
(T — AD) =0, (T-AD™ lu 20. 


Then we obtain a Jordan chain of 
{(T — AD) ua, (PAD 20, +++ , ur}. 
Notice that (T — AI)m- tu; is an eigenvector of T. 


(3) Group all Jordan chains to form a Jordan basis, or equivalently an in- 
vertible matrix M such that M~'[T]M is in Jordan normal form. 


—1 -2 6 
Example 8.6.5. Let A = |—1 O 3| be as in Example 8.6.3. From Ex- 
-1 -1 4 


ample 8.6.3, we know that the eigenvalues are A123 = 1 and the Jordan normal 
form is 


1 0 0 
J=|0 1 1 
0 0 1 


For the Jordan block with size 1, we solve for an eigenvector with 


—2 -2 6] |z 
(A-Du=|-1 —1 3] Jy} =0, 
—1 -1 3 z 
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which lead to 
3 
u=| 1), = 10 
1 


Either {u1} or {wu} forms a Jordan chain of length 1. 

For the Jordan block with size 2, we have (A — I) # 0 and (A— J)? =0. 
Take an arbitrary nonzero vector, say uz = (1, 0, 0) such that (A — I)ug + 0 
and set 


Then {u}, uz) forms a Jordan chain of length 2. 

According to Theorem 8.6.4, we need to choose linearly independent initial 
vectors for the Jordan chains. Indeed, by inspection we note that both {u}, u1} 
and (ful, u} are linearly independent. Therefore, we may choose 


-1 -2 1 
M = [ui:ug:ue]=] 1 -1 Of, 
0 -1 0 
or 
3 -2 1 
M = |u) : u3: u] = |0 —1 0], 
1 -1 0 
such that 
MAM = J. 


Example 8.6.6. Find the Jordan normal form J of 


0 
1 
1 
0 


=. = O © 


and find M such that M-!AM = J. 


Solution: The characteristic polynomial is det (A— AI) = (A—2)?(A—1)? and 
eigenvalues are Ay2 = 2 and A34 = 1. For each eigenvalue, there are possibly 
1x1 and 2 x 2 Jordan blocks. We use Theorem 8.6.1 to determine the number 
of each possible Jordan block. 

For A12 = 2, we consider (A — 27) and have 


[o 0 0 al [o 0 0 d) 
_10 0 1 0 2_|00 -1 1 
Oe] lo gat Te le. ay. si-a 
00 0 -1 00 0 1 
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and 
0 0 0 0 
3 |0 0 1 —2 
Ce 0 0 -—1 3 
0 0 0 -—1 


Then rank(A — 21) = rank(A — 21)? = rank(A — 21)? = 2. By Theorem 8.6.1, 
for the eigenvalue A12 = 2, we have 

N, =rank(A — 1) — 2rank(A — I)! + rank(A — J)? =4-4+2=2, 

Na =rank(A — 1)! — 2rank(A — I)? + rank(A — J)? =2-4+2=0. 


For Asa = 1, we consider (A — I) and have 


1000 1000 1000 
(0.1.1.0 e |0111 a _ |0111 
CA) op g a A E g a a A a OO 
0000 0000 0000 


Then rank(A — I) = 3,rank(A — I)? = rank(A — 1)? = 2. By Theorem 8.6.1, 
for the eigenvalue A12 = 2, we have 

N, =rank(A — I)° — 2rank(A — I)! + rank(A — I)? = 4-6 +2 = 0, 

Nə =rank(A — I)! — 2rank(A — I)? + rank(A — I)? = 3—4 +2 = 1. 


Therefore, the Jordan normal form is 


OO OWN 
O ON O 
2 = E! 
RR = OO 


Next we find M such that M~1AM = J. For 12 = 2, we solve (A — 2/)x = 0 
to obtain the corresponding eigenspace 


N(A — 21) = span(u,, u1) = span 


O O O— 
DE = & 


where {u1} and {u} } are Jordan chains of length 1. 
For Aza = 1, we solve (A — 1)?x = 0 and (A — I)x + 0 to obtain 


uz = (0, —1, 1, 0), u% = (0, —1, 0, 1). 


If we choose uz to form a Jordan chain of length 2, then (A—I)uz2 = (0, 0, 0, 0) 
which is not desired. If we choose u% to form a Jordan chain of length 2, then 
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ul = (A — Du, = (0, —1, 1, 0) # 0 and we obtain the desired Jordan chain 
{u}, uy} of length two. Then (uy, u4, u4, uh} is a Jordan basis for A. That is, 


1 0 0 0 
0 1 -1 -1 
ws 0 0 1 0 
0 0 0 1 


is such that M-1AM = J. 


Exercise 8.6.7. 


1. Find the Jordan normal form of the following matrix. 


2 2 3 0 4 4 5 —5 

0 21? 13 3? |4 4 5 —5|' 
2. For each of the following matrices A, find a matrix M such that M~1AM 
is in Jordan normal form. 


10 0 i ie eat 
0 1 -1 -i 

-2 —2 2 
O (la ai ca 
00 0 1 


3. Use the Jordan normal form to prove the Cayley-Hamilton Theorem. 


4. Let A be a square matrix with all eigenvalues {A1, A2, :-: , Ax}. Show that 
for every positive m € N, the eigenvalues of A” are LAR, A, A. 


5. Let A be a square matrix. Show that the eigenvalues of A are all zeros if 
and only if there exists a positive m € N such that A” = 0. 


6. Let A be a square matrix. Show that if there exists a positive m € N such 
that A” = 0, then det(A + E) = 1. 


7. Let A be a square matrix with A” = I for some positive m € N. Show that 
A is diagonalizable and every eigenvalue of A is a root of p(t) = tm — 1. 


8. Let A be a square matrix with 4? = A. Show that A is diagonalizable and 
every eigenvalue of A is a root of p(t) = t? — t. 


9. Let A be an nx n matrix with 0 the m times repeated eigenvalue. Show 
that rank(A) = rank(A?) if and only if rank(A) = n — m. 
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10. Let 


1 0 0 
A 1 
zi . 0 
0 A 1 
0 OA 


be an n x n Jordan block, p a polynomial. Show that 


PA) = 


Oy) 


ap" (A) map? 
ap’ (A) mop?) 
0 pa) ap 
0 0 p 
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Chapter 9 


Linear programming 


9.1 Extreme POINTS n e aie a na a ata a ala e tă ai o a dees 201 
9.2 Simplex A aa na a ar a ra e aaa 207 
Initial basic feasible solution .............. 0.0. eee 207 
Transfer of extreme points ............ 0 cece eee eee nent eee ees 208 
Optimality condition .......... eee 210 
9.3 Simplex tableau ............ eee aaa 21Z 


Many optimization problems in management and industry are modeled in the 
form of optimizing a linear function over the solution set of a system of linear 
equations. 


minimize diz 
4 (9.1) 
subject to Aa = b, and x > 0, 


where d, x € R”, A is an m xn real matrix, b € R™ and x > 0 means that each 
coordinate of x is nonnegative. Certainly when m = n and A is invertible, we 
may first solve for x and obtain a unique optimal solution if x > 0 is satisfied 
at the same time. The issue is that in practice we may have 


rank(A)=m<n, (9.2) 


and hence Az = b has infinitely many solutions. It becomes a nontrivial task 
to examine the nonnegativity and optimality among infinitely many solutions. 
We call the optimization problem at (9.1) a linear programming problem. 
We devote this chapter to a short illustration of the mathematical theories 
and techniques related to solving this optimization problem. 


9.1 Extreme points 


Let us begin with an example. 
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T3 


Tı 


FIGURE 9.1: Feasible region for (9.20) 


Example 9.1.1. Consider 
minimize £1 + £2 
T 


Tı (9.3) 
subject to [1 1 1] |z2| = 1, and x= (2, £2, 23) > 0. 
T3 


The linear equation is the plane x; + £2 + 3 = 1 and is under the assumption 
that x > 0 gives a triangular region K in R, as is shown in Figure 9.1. Every 
point in the triangular region K which satisfies the constraints is called a 
feasible solution of this linear programming problem. To find a minimizing 
point for the linear function f(x) = x, + x2, which is called the objective 
function, we can do some elementary analysis from the constraints: x; +£2 > 
0 since x > O and the equality holds only if xı LQ 0, 23 1. Then 
we located the minimizing point x* = (0, 0, 1) such that the optimal value 
f(a*) =0+0=0. 

Among infinitely many points in the region K we successfully find the 
optimal solution, which is located in a corner of the feasible region K. It is 
not by chance that the optimal solution is in the corner. Imagine that we hold 
the family of the planes 1, + 12 = c, ce R, and move it according to different 
values of c which becomes closer to the origin if the value of c becomes smaller. 
The value of c becomes minimal when the plane 11 +22 = c touches the corner 
(0, 0, 1). It seems that the corner points are very special. Indeed, they are not 
a linear combination of any other vectors in the feasible region and are called 
extreme points. D 
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Definition 9.1.2. A set C in a vector space V is called a convex set if for 
every £1, 12 € C and for every A € (0, 1), we have 


AZI + (1 = A) EC. 


That is, the line segment between x; and x2 is completely contained in C. We 
call sy; + (1— s)ya with y1, y2 € V and s € (0, 1) a convex combination of 
Yi, Y2. 

A point x in a convex set C is called an extreme point if x is not a 
convex combination of any two distinct vectors in C. 


Example 9.1.3. Let A be an m x n real matrix with rank(A) = m < n and 
b e R”. Let 
K = {x € R” : Ar = b, and z > 0}. 


Then K is a convex set in R”. Indeed, for every z1, 12 € K, we have xı > 0, 
z2 > 0, and 
Azi = b, Aro =b. 


It follows that for every A € (0, 1), Axı + (1 — A)z2 > 0 and 
AAA a = des e 


That is, 71, za € K and every A € (0, 1), the convex combination Ax, + (1 — 
A)z> € K. K is a convex set in R”. D 


Next question is how to describe the extreme points of a feasible solution 
set defined by K = {x € R” : Ax = b, and x > 0}? 


Theorem 9.1.4. Let A be an mxn real matrix with rank(4) = m < n 
and b € R”. Let 


K = {x € R” : Ar = b, and x > 0). 


Then x € K is an extreme point of K if and only if £x = 
(£1, £2, :::, n) E K has at most m nonzero coordinates and the 
columns of A corresponding to the nonzero coordinates of x for Ax are 
linearly independent. 


Proof. “=>” Suppose that x = (11, £2, *** , £n) € K is an extreme point of 
K. Without loss of generality, assume that the first k coordinates of x are 
nonzero. Then we have 


11C1 + Io +--+ + rc = D, 


where ca, C2, ++ , Cp are the corresponding columns of A for Ax with A = [c; : 
Ca : +++ Cu]. We show that ca, co, +++ ,Cp are linearly independent and hence 
k < m = rank(A). 
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Suppose not. Then there exists (yi, Ya, ++- ,-** , yk) #0 such that 
yier + yoco +--+ + YkCk = 0. 


Let y = (y1, Ya, +++ 3t 5 Yk; 0, ++- ,0). Then there exists e > 0 small enough 
such that 

x + €y > 0, x — €y > 0, 
since x > 0 and a; > 0 for 1 < i < k. Moreover, we have A(x + ey) = b = 
A(x — ey) which imply that x + ey, x — ey € K and 


1 1 
v= ate) +3- e). 


That is, x is a convex combination of two distinct points in K and x is not 
an extreme point in K. This is a contradiction. 
“=>” Without loss of generality, let the first k < m coordinates of x 


be nonzero with x = (£1, £2, :::, Zk, 0, +++ ,0) and the first k columns of 
A = [c1 : cg: +++: Cn] are linearly independent. Then we have 
£1C1 + £2C2 ++: + TkCk = D. (9.4) 


Suppose, for contradiction, that x is not an extreme point. Then there exist 
y, z E€ K with y # z and 0 < s < 1 such that 


z = sy + (1 — s)z. 


Since y > 0,z > 0 and the last n — k coordinates of x zero, it follows 
that the last n — k coordinates of y and z are zero, too. Then we have 


y = (y1, Y2,°°+, Yk, 0, 4,0) E K, z = (21, Z2, >> , Zk, 0, ++- ,0) € K and 
YiC1 + Y2C2 +++ + YkCk = b, 2101 + 2202 +: + ZkCk = b. (9.5) 
Since {c1, C2, -:- cu) are linearly independent, by (9.4) and (9.5) we have 
L=y=z. 
This is a contradiction. O 


Definition 9.1.5. Let A be an m x n real matrix with rank(A) 
m < n and be R”. Let 


K = {x € R” : Ar = b, and x > 0). 


Let B be an m x m invertible submatrix of A whose columns constitute 
a basis for the column space of A. If x € K assumes zero values for 
all n — m coordinates not associated with B, x is called a basic fea- 
sible solution with respect to the basis B. The coordinates of basic 
feasible solution x associated with B are called basic variables. The 
coordinates of basic feasible solution x not associated with B are called 
nonbasic variables. 
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The conclusion of Theorem 9.1.4 implies that z € K is an extreme point if and 
only if x is a basic feasible solution with respect to some basis of the column 
space of A. 

Now we discuss how a basic feasible solution is related to optimality. 


Theorem 9.1.6. Let A be an mxn real matrix with rank(4) = m < n 
and b e R™. Let 


K = {x € R” : Ar = b, and x > 0). 


i) If K z 0), then there exists a basic feasible solution z € K; 


ii) If K contains an optimal solution, then it contains a basic feasible 
solution which is optimal. 


Proof. i) Suppose that K # Ø. Then there exists x = (£1, £2, + , Tp) € K 
such that 
11C1 + T2C2 + +- H LnCn = b, 


where A = [c1 : cg : +++ : cn]. Assume that x has k nonzero coordinates. 
Without loss of generality, we assume that 


11C1 + Lala +- + TkCk = D. (9.6) 
If {c1, co, ++- cu) is linearly independent, then k < m and z is a basic feasible 
solution. 
If {c1, co, ++- cu) is linearly dependent, there exists a nontrivial linear 
combination of (c,, Ca, ::: ,Ck} such that 
Vica + yoco +-** + YkCk =0. (9.7) 


Then for every e € R, we obtain from (9.6) and (9.7) that 
(x1 — €y1)c1 + (2 — €y2)ca + +++ + (Ek — €Yu)ca = D. 


Let 


= i 2: >00>0, 
€0 A E Yi > \ 


Then x — egy € K where y = (y1, ya, :::,yx) has at most k — 1 positive 
coordinates. By the same token we repeat the same process on x — egy € K 
to obtain a feasible solution x — e’y € K until the corresponding columns of 
A are linearly independent and hence x — e€'y is a basic feasible solution. 

ii) Let x = (21, £2, ++- , Yn) € K be an optimal feasible solution. Assume 
that x has k nonzero coordinates. Without loss of generality, we assume that 


T1C1 Foo: + LkCk = D. (9.8) 
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If the columns {c1, c2, ::: ,Ck} of A are linearly independent, then k < m 
and z is a basic feasible solution and is optimal. 
If the columns (ca, co, ++- ,Cx) of A are linearly dependent, by the same 


procedure as for the proof of i), we can reduce x into a basic feasible solution. 
It remains to show that x — ey is optimal for every e. Note that x — ey is a 
feasible solution for every e and the value of the objective function at x — ey 
is 
ca — ecly. 

If cTy # 0, we can choose e + 0 so that cT x —e'cTy + cl and x—ey is another 
optimal feasible solution. This is a contradiction. Hence cy = 0 and x — ey is 
an optimal feasible solution for every e, which can be reduced into an optimal 
basic feasible solution. O 


Theorems 9.1.4 and 9.1.6 imply that we need only to search among the 
extreme points, or, equivalently, the basic feasible solutions for optimal solu- 
tions, instead of searching among infinitely many feasible solutions. For small 
scale problems we may visualize the feasible region to locate the extreme 
points and find the optimal solution. In the next section, we introduce the 
simplex method for solving linear programming problems which includes an 
approach to transfer from one extreme point to another without geometrical 
visualization of the feasible region. 


Exercise 9.1.7. 
1. Determine whether the following problem has a feasible solution. 


minimize £1 + £2 
T 


1 1 1 


subject to E 1-1 


Tı 1 
| t2| = +] „and x = (#1, £2, £3) > 0. 
23 


2. Draw a graph of the feasible region of the following optimization problem 
and find an optimal solution. 


minimize 11 — T2 
x 


Tı 
subject to [1 1 1] xa | =1,and g = (z1, £2, 13) > 0. 
23 


3. Show that every vector space is a convex set. 
4. Let V be a real vector space. Let A = {x1, £2, ::: , £n} be a subset of V 
and 
n 
co(A) = (Saraca tata = A >0,i=1, 2, eds o) 
i=1 


Show that co(A) is a convex set. (We call it the convex hull of A.) 
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5. Let A be an m x n real matrix with rank(A) = m < n and b e R”. Let 
K = {x € R” : Ax = b, and z > 0). 
Show that there are finitely many extreme points in K. 


6. Let A be an m x n real matrix, b, y € R” and d, x € R”. Consider the 
following two optimization problems: 


minimize dix 
z (A) 
subject to Ax < b, and z > 0 


and 


minimize diz 
(x,y) 
x (B) 
subject to [A: J] | = Ax +y = b, and z > 0, y > 0. 


Show that x* is an optimal solution of (A) if and only if (x*, 0) is an optimal 
solution of (B). 


9.2 Simplex method 


In the last section, we learned that optimal solutions of linear program- 
ming are among the extreme points, or, equivalently, among the basic feasible 
solutions. The simplex method for solving linear programming problems 
developed by George B. Dantzig in 1947 is a procedure which transfers from 
a basic feasible solution to another until an optimality condition is satisfied. 


Initial basic feasible solution 


The first question is how to find the initial basic feasible solution in order 
to begin the procedure of the simplex method. Let A be an m x n real matrix 
with rank(A) =m < n and b € R” and c, x € R”. Consider 


minimize diz 
îi (9.9) 
subject to Ax = b, and x > 0. 


We assume b > 0 since we may multiply the corresponding equation by minus 
one if there exists a negative coordinate of b. However, it is not obvious to 
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identify a basic feasible solution for problem (9.9). So we consider the following 
auxiliary problem 


m 
minimize ) Ui 
(x, u) i 

= 


(9.10) 
subject to Arx +u = b, and x > 0, u > 0. 


The auxiliary problem (9.10) has a trivial basic feasible solution (x, u) = (0, b) 
from which we may proceed with the simplex method to find an optimal 
solution if it exists. 

If the minimum value of (9.10) is zero with u = 0, then it has a basic 
feasible solution (x*, 0) where x* is a basic feasible solution of problem (9.9). 
If the minimum value of (9.10) is nonzero then u + 0 and (9.9) has no feasible 
solution because, otherwise, (9.10) should have achieved zero minimum. 


Transfer of extreme points 


In the following, we illustrate how to transfer from one basic feasible so- 
lution to another. For notational convenience, we assume that A = [Im : cm : 

: Cn| and problem (9.9) has a basic feasible solution (xg, 0) € R” where 
Lp = (£1, 22,*** , £m). (In practice the columns of Im may appear in any 
column of A.) Then we have 


12101 + Loc + +++ + mm = b. (9.11) 
Noticing that [c1, co, ::: , Cm} are linearly independent, each column cp, p > 
m can be written as a linear combination of the columns {c1, Ca, ++: , Cm} 


which are a basis for the column space of A. Namely, 
a1pC1 + A9pCa +*** + AmpCm = Cp. (9.12) 
For every e > 0, multiplying (9.12) by e and subtracting from (9.11) we have 


(£1 — €Q1p)c1 + (£2 — Ea2p)C2 + ::: + (Em — EAmp)Cm + €Cp =b. (9.13) 


If e = 0, we have the old basic feasible solution. If e > 0 changes from zero, 
the coefficients of the linear combination in (9.13) are positive, until e reaches 
the value: 


Let ¿o be the row index where ey is achieved. Then the basis vector c;, at 
the ¿o-th column is to be moved out. Since aigp > 0, (ca, C2, *** „Cm, Cp}\ {Cio} 
is linearly independent and we obtained a new basic feasible solution: 


(x1 — €001p, *** ,Tig-1 — €0Uio—1,p> 0, Tio+1 — Elig+1p> 
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Tm — E0Amp, 0, +++ ,0,€, 0, --* ,0), (9.15) 


where the e is the p-th coordinate. Let us use an example to show the above 
process. 


Example 9.2.1. Consider the system Ax = b with augmented matrix given 


by 
Cy Co C3 
1 0 0 
0 1 0 
0 0 1 


where x = (3, 2, 1, 0, 0) is a basic feasible solution with the corresponding 
columns {c1, c2, c3} linearly independent. Namely, x1, £2, £3 are basic vari- 
ables. If we want to bring cs into the basis for the column space, we choose 


which is achieved at x2 = 2, a25 = 1. Namely io = 2. cz will be removed and 
{c1, cs, cs) is the new basis. Using the expressions at (9.15), the new basic 
feasible solution is 


(3—2-1,2—2-1, 1—2- (—3), 0, 2) = (1, 0, 7, 0, 2). 


If we use the pivot ass in cs to reduce other entries in the column to zero, we 
obtain 


C1 C2 C3 C4 


1 —1 0 —6 
0 1 0 4 
0 3 1 17 


The new basic variables are 11, £3, 15 whose values are contained in the last 
column for b. o 


Remark 9.2.2. From Example 9.2.1, we know that if the augmented matrix 
[A : b] is in the reduced row echelon form, there is a basic feasible solution 
contained in the last column. During the process of transferring from one ex- 
treme point to another, if we reduce the basis vectors to have only one nonzero 
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entry, then a basic feasible solution can be constructed according to the or- 
der of the basic variables. For instance, in the second tableau, {c1, c3, cs) are 
basis columns and the corresponding basic feasible solution is (1, 0, 7, 0, 2). 


Remark 9.2.3. If none of the a;p's in (9.14) is positive, then all the coeffi- 
cients of the column vectors in (9.13) increase, as e increases, while no new 
basic feasible solution can be identified. However, this means that there exists 
feasible solutions with arbitrarily large coefficients and the feasible region A 
is unbounded. 


x 


Remark 9.2.4. It may happen that the eo = 
that the new and old basic variables during the transfer are both zero, and 
the objective function will not change value. In such a case the process of 
carrying out the simplex method may enter into a cycle. We call this case a 
degenerate case. However, cycling is not common and can be avoided in the 
coding practice. 


+ = Q in (9.14), which implies 


Giop 


Optimality condition 


After we learn how to transfer from one extreme point to another, we need 
to know when optimality has been achieved so that the process should stop. 

Let A be an m x n real matrix with rank(A) = m < n and b € R” and 
d, x € R”. Consider problem (9.9). Suppose that (ap, 0) with ag € R” isa 
basic feasible solution and that A = [Im : Cm+1 : *** Cn] which is achievable 
using elementary row operations and/or renaming of the variables. 

The value of the objective function at (xg, 0) = (b, 0) is 


zo = dbp, 
where d, = |di, da, --- , dm]. To justify the current basic feasible solution x = 
(xp, 0) is optimal, we need to show that any other possible feasible solution 
will not lower the value of the objective function. Let {e1, e2, ++- , €m} be the 
standard basis of R™. We have for every feasible solution x = (11, £2, +++ , Tn) 
of Ax = b with A = [Im : Cm41 2 +++: Cn], 
1101 + 122 +--+ + Emem = b — Em+1Cm+1 — Lm42Cm42 — *** — En€n- 
(9.16) 
Multiplying both sides of (9.16) with d4, we have 
y di = dizi + do>x> A dm Tm 
= (9.17) 


T T T T 
= dpb = Um+1dBCm+1 = Um+2dBCm+2 ae toe LndBCn 


T T T 
= 20 — Lm41dBem+1 = Lm 42d BCm+2 a Zn dgCn- 
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Then we have 


dx =z + (dm+1 — dhcm41)£m+1 + (dm+2 — dEcm+2)Um+1 
++ (da — dica Ln. (9.18) 


Notice that any other feasible solution «x satisfies x > 0. If r; = d; — dcj > 0 
for every j € (1, 2, :::,n), we have the value of the objective function: 


z=dla > zo. 


Namely zo is the optimal value achieved at the current basic feasible solution 
(ap, 0). We have arrived at the following optimality condition theorem. 


Theorem 9.2.5. Let A = [ci : c2 : +++: Cn] be an mx n real matrix 
with rank(A) = m < n, b € R”, d, x e R” and d = [d;, da, --- , dn]. 
Consider the linear programming problem 


minimize dix 
T 


subject to Ag = b, and x > 0. 


If (wp, 0) € R” is a basic feasible solution, and r; = d; — d§c; > 0 for 
every j € {1, 2, --- ,n}, then (xp, 0) is an optimal solution. 


Remark 9.2.6. Suppose that the initial basic variables are corresponding to 
the first m columns of A, which is the m x m identity matrix Im; then for 
j =1, 2,--+,m, we have 


rj = dj — dc; = dj — dj = 0. 
Namely, r; corresponding to basic variables are zero. O 
Exercise 9.2.7. 


1. Let A be an mx n real matrix with rank(A) = m < n and b € R” and 
d, x € R”. Consider 


minimize dix 
zx 


subject to Ag = b, and x > 0. 


Let x be a basic feasible solution with respect to a basis B which is an m x m 
submatrix of A and each column of B contains only one nonzero entry which 
is 1. Suppose that the îo-th column of A contained in B is replaced with 
the p-th column of A not contained in B in order to transfer from one basic 
feasible solution x = ao to another basic feasible solution, resulting in a new 
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equivalent system 4'x = b' where the new basic feasible solution corresponds 
to a basis which contains only one nonzero entry 1. Show that 


Qip . a 
=Aij — — ligj ? # 10, 


/ 
a, 
iop 


J 


9.3 Simplex tableau 


We have discussed the theory for the simplex method in the last section. 
We discuss in this section the technical details how to carry out the algorithm 
of the simplex method. Since the method for the transfer of extreme points 
of Ax = b, x > 0 was shown in the last section on augmented matrix, what 
is left is the details on how to check the optimality conditions at each step of 
the transfer. 

The optimality condition is derived from the objective function z = dz, 
or explicitly 


dızı + dota +--++dntn — z = 0. (9.19) 


If we want to append this equation to the augmented matrix [A : b], then we 
need a separate column to record the coefficients of z during elementary row 
operations. However, this is not necessary since if we do include the coefficients 
of z with Ax = Ax + 0z = b, the coefficients of z will be all zeros and the only 
nonzero coefficient of z would always be —1 in the last row. Namely, if we do 
carry coefficients of z, the corresponding column of coefficients will always be 
0,0, 0, —1]”. 

At the initial tableau, we append the coefficient d of x to the augmented 
matrix [A : b] and place the right hand side 0 of (9.19) at the last column. 
Using elementary row operations we can eliminate the basic variables from the 
objective function so that we have (9.18) and the first tableau for the linear 
programming problem, which is called a simplex tableau. Namely, we have 


Tm+1Tm+1 + Tm+2UTm+2 Het +TnTn — 2z = — 20» 


from which we check the optimality condition whether r; > 0 for every j = 
1, 2,---,n. Let us use a concrete example to show the implementation of 
the simplex method, which contains two phases: Phase I shows how to find 
the initial basic feasible solution; Phase II shows how we begin with the last 
tableau of Phase I to find the optimal solution of the linear programming 
problem. We summarize the algorithm after the example. 
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Example 9.3.1. 


minimize 2x + 3y 
subject to  z+2y+z2z = 8 (9.20) 
3x + 4y + 2 18 


Y 2 > 0. 


Solution: To find the initial basic feasible solution, we consider the following 
auxiliary problem: 


minimize u + ua 
subject to x+2y+z+u = 8 (9.21) 
3x + 4y + z+ u2 18 l 


T, Y, 2, U1, U2 > 


The initial simplex tableau is 


where the last row stands for the objective function us + ug — f = 0. We use 
elimination to reduce the entries of the last row under the basic variable zero. 
Then the first simplex tableau is 


C1 C2 C3 C4 © b 
R3—Ry 1 2 1 1 0 8 
R3=Ri (3) 4 1 0 1 18 
-4 6 -2 0 0 | —26 


{c4, cs) is the current basis with basic feasible solution x = (0, 0, 0, 8, 18). 
To have a basic feasible solution, we use two steps to remove {c4, cs) from 
the basis. First we bring c into the basis, 


which is achieved at ¿ip = 2. Note that in this case the basic variables are not 
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in the first columns and the subscript î for x; indicates the value of the basic 
variable in the i-th row. 
Using the pivot circled in the first simplex tableau, we have 


C1 C2 Ca C4 

2 4 
2 Ri—Ra 0 3 
R3—-R 4 1 
3 £02 că A 

1 3 3 0 


: Ti 
e=min¿— : aja > 0 
i 052 


Therefore (2, 3, 0, 0, 0) is a basic feasible solution such that the auxiliary 
problem (9.21) is minimized with objective function value 0. Hence (2, 3, 0) 
is a basic feasible solution of the original problem (9.20). 

Notice that the columns ci, cz, ca and b are common for both (9.20) and 
(9.21). Therefore we can reuse it for the next steps. We remove the columns 
for c4 and c5 and update the last row by the coefficients of the new objective 
function. 
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Eliminating the entries below the basic variables, we have the first simplex 
tableau: 


Since the optimality condition is not achieved with r3 = —1, we bring c3 into 


the basis: 
dl: 
€ =min { E : ao > o) 
2 i3 


which implies that the optimality is achieved and the basic feasible solution 
is (5, 0, 3) with minimal objective function value 10. o 


In summary, we carried out the following algorithm for the linear program- 
ming problem (9.9): 


Step 1: Write the system Ax = b such that b > 0, remove any redun- 
dant equations such that rank(A) = m < n and formulate the 
auxiliary problem (9.10) for a basic feasible solution of (9.9); 


Step 2: Write the augmented matrix [A : Im : b] for Ax +u = b and place 
the coefficients d for the objective function d/a — z = 0 at the 


last row with zero in the column for b, ignoring coefficients for z. 
We call the matrix so obtained the initial tableau for (9.10); 


Use elementary row operations to reduce the nonzero entries un- 
der the columns of the basic variables u into zero. We call the 
matrix so obtained the first tableau for (9.10); 
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Transfer the extreme points so that u is no longer the basic vari- 
able. At each transfer, the column to be moved out and to be 
replaced by a specified column cp, 1 <p < n — Mis the one 
corresponding to the basic variable whose value is in row ¿y such 


that 
A Ti 
e= min | Zi E >0): 
2 Qip 


Check the optimality condition, that is, whether coefficients in 
the last row satisfy r; > 0 for 1 < j < n + m, and whether the 
objective function is zero. If the optimal solution is nonzero, (9.9) 
has no feasible solution. The algorithm stops. Otherwise, the first 
n—m coordinates of the basic feasible solution currently obtained 
are a basic feasible solution for (9.9); 


Delete the columns for u and the last row of the last tableau 
obtained in Step 5, and update the last row by the coefficients of 
the new objective function of (9.9). We obtain the initial tableau 
for (9.9); 


Use elementary row operations to reduce the nonzero entries un- 
der the columns of the basic variables into zero. We call the 
matrix so obtained the first tableau for (9.9); 


Check the optimality condition on the current tableau, that is, 
whether coefficients in the last row satisfy r; > 0 for 1 < j < n. 
If yes, then the values of the basic feasible solution are given in 
the last column and the negative of the objective function value 
—Zo is in the right lower corner; 


Otherwise, transfer the extreme points to achieve optimality. At 
each transfer, the column to be moved out and replaced by a 
specified column cp, 1 < p < n is the row number to such that 


x; 
= say > oh. 


Qip 


Exercise 9.3.2. 


1. Determine whether the following linear program has a basic feasible solu- 
tion: 


minimize 2x + 3y 
subject to  z+2y+z2z = 8 
3r +4y+z = 28 
z, yY, z > 0 
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2. Solve the following; linear programming problem. 


minimize I+y 
subject to x+2y+z = 8 
3a + 4y + z 18 


yz > 0. 


3. Solve the following linear programming problem. 


minimize 2g + 3y 
subject to x+ 2y 
3x + 4y 

z, Y 


IV IV IV 
ER 
00 


4. Solve the following linear programming problem. 


maximize 2x+3y 


subject to  +2y 
3a + 4y 
z, Y 


IVIA IA 
= 
00 


5. Solve the following linear programming problem. 


minimize 27 — 3y 


II 
00 


subject to  1+2%y+2 
3a + 4y + z 
T, Y, Z 


IV Il 
o 


6. Solve the following linear programming problem. 


maximize -2x + 3y 


subject to x+2y+z = 8 


3 
S 
x 
IV Il 
= 
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LU decomposition, 36 
QR decomposition, 96 


algebraic multiplicity, 125 
augmented matrix, 17 


basic feasible solution, 204 
basic variables, 204 
bijection, 171 

block diagonal form, 184 


Cayley—Hamilton theorem, 130 
characteristic polynomial, 121 
Cholesky decomposition, 150 
co-factor expansion, 109 
column space, 52 

convex combination, 203 
convex hull, 206 

coordinate vector, 64 

cosine formula, 6 

covariance matrix, 164 
Cramer’s rule, 113 


determinant, 102 
diagonalizable matrix, 120 
direct sum, 73 

dot product, 4 


eigenspace, 121 
eigenvalue, 120 
eigenvector, 120 
elementary matrices, 13 
even permutation, 106 
extreme points, 202 


feasible solution, 202 
first principal component, 165 


219 


Fredholm alternative, 74 

full singular value decomposition, 
158 

fundamental solution matrix, 136 


Gauss—Jordan elimination, 20 
Gaussian elimination, 20 
generalized eigenspace, 186 
geometrical multiplicity, 125 
Gershgorin disc, 129 
Gram-Schmidt process, 93 


Householder matrix, 42, 99 
Householder transformation, 99 
hyperplane, 50 


identity matrix, 9 

infinity norm, 134 

injective map, 171 

inner product, 72 

inner product space, 72 

invariant subspace, 77, 179 

inverse, 27 

inversions, 103 

invertible linear transformation, 
174 

invertible matrix, 27 


Jordan basis, 188 
Jordan block, 188 
Jordan chain, 188 
Jordan normal form, 184, 189 


kernel of linear transformation, 
175 


least squares solution, 87 
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linear combination, 2 
linear operator, 169, 172 
linear programming, 201 
linear space, 46 

linear transformation, 169 
linearity, 23 

linearly dependent, 11 
lower triangular matrix, 43 


matrix, 8 
mean-deviation form, 164 


nonbasic variables, 204 
normal system, 87 
nullspace, 5l 


objective function, 202 
odd permutation, 106 
one-to-one map, 171 

onto map, 171 

orthogonal, 72 

orthogonal basis, 92 
orthogonal complement, 74 
orthogonal matrix, 41 
orthonormal basis, 91 


parallelogram law, 84 
permutation matrix, 40 
pivot, 17 

polar decomposition, 163 
polynomial interpolation, 89 
positive definite matrix, 149 
positive semidefinite, 152 


principal component analysis, 164 
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projection, 79 
quadratic forms, 143 


range of linear transformation, 
175 

rank, 59 

Rayleigh quotient, 160 

reduced singular value 
decomposition, 158 

representation matrix of linear 
transformation, 172 

row space, 52 


Schur factorization, 148 
Schwarz inequality, 6 
similar matrices, 127 
singular matrix, 27 


singular value decomposition, 157 


singular values, 156 
skew-symmetric, 43 
spectral decomposition, 145 
surjective map, 171 
symmetric matrix, 37 


trace, 127 

transition matrix, 65 
transpositions, 103 
triangle inequality, 7 


upper triangular matrix, 43 


Vandermonde matrix, 89 
vector space, 46 


